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ABSTRACT 


Recently  developed  high  speed  networks  are  capable  of  transmitting  data  at  rates  of 
100  Mbps  or  more.  One  such  network  protocol  is  Fiber  Distributed  Data  Interface  (FDDI). 
This  network  has  a  physical  transmissioji  rate  of  100  Mbps.  Analytical  and  simulation 
studies  have  shown  that  ^le  FDDI  protocol  should  provide  actual  throughput  of  K0%  to 
95%  of  this  physical  rate.  Can  the  end  user  expect  to  see  this  kind  of  performance?  If  not, 
then  what  kind  of  throughput  can  actually  be  expected  and  where  are  the  bottle  necks? 

In  on  ‘  to  answer  these  and  other  related  questions,  two  areas  were  studied:  First,  a 
perforntw  ;..c  c  .  '  parison  between  a  40MHz  SPARCstation  10  workstation  and  a  SOMHz 
SPARCstatici.  10  workstation  was  conducted  using  the  Neal  Nelson  commercial 
benchmark  tool.  Next,  a  well-known  network  measurement  tool,  ttcp,  was  used  to  obtain 
data  transfer  rates  while  varying  several  tunable  operating  system  and  network  parameters. 
The  parameters  varied  were:  Target  Token  Rotation  Time,  TCP/IP  window  size,  NFS 
asynchronous  threads.  Logical  Link  buffer  size  and  Maximum  Transfer  Unit  size.  The 
results  from  the  commercial  benchmark  analysis  were  used  to  determine  if  there  are  any 
differences  which  can  affect  transfer  rates  between  the  two  workstations. 

The  results  fr’om  the  commercial  benchmark  tool  clearly  showed  that  the  newer,  higher 


speed  processor  is  faster.  The  network  tool  ttcp  showed  that  the  TCP/IP  window  size  had 
the  largest  impact  on  throughput  performance.  Throughput  more  than  doubles  from  a 


window  size  of  4k  to  a  window  size  of  20k.This  is  followed  by  having  more  than  one 
workstation  transmitting  data  simultaneously.  Having  two  workstations  transmitting  nearly 
halves  throughput  This  is  followed  by  having  a  faster  processor.  A  measurement  of  file 
transfers  using  rqj  system  caUs  showed  that  the  largest  impact  on  file  transfer  speed  is  the 
overhead  of  receiving  the  transferred  file.  i  3y . . 
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I.  INTRODUCTION 


A.  BACKGROUND 

Data  communication  networks  are  now  an  essential  pan  of  our  society.  Our 
technology  base  has  given  us  workstations  which  can  process  data  at  speeds  which  makes 
mainframes  from  just  a  few  years  ago  look  slow  in  comparison.  Now,  not  only  must  we 
process  the  data  faster,  but  we  also  distribu^  the  information  to  other  locations  at  speeds 
which  just  a  few  years  ago  were  impossible.  We  truly  are  in  the  information  era. 

In  the  1960s  and  1970s,  the  computer  industry  worked  hard  to  develop  new 
technologies  which  would  give  us  faster,  more  powerful  computers.  The  dramatic  advances 
in  integrated  circuits  technology  made  possible  die  wide  availability  of  larger,  more 
powerful  super  computers,  low-cost  worknations,  and  personal  computers  [ALBE94]. 
There  were  the  companies  which  believed  diat  the  large,  centralized  processors  were  the 
solution  to  everyone’s  problems.  At  the  same  time,  other  companies  developed  smaller 
computers  called  minicomputers.  These  minicomputers,  and  their  successors,  desktop 
workstations,  started  filling  the  needs  of  small  companies  and  universities  which  couldn’t 
afford  the  cost  of  large  mainframes  and  did  not  need  the  processing  power  provided  by  the 
large,  all  in  one  solution  provided  by  the  mainframe. 

In  the  world  of  mainframes,  the  need  to  distribute  data  to  otiier  computers  was  not 
critical.  The  single  mainframe  would  handle  all  of  a  company’s  processing  needs.  If  there 
was  a  need  to  handle  additional  processing,  the  manufacturer  of  that  mainframe  provided  a 
solution  which  would  allow  their  mainframe  to  communicate  with  another  of  their 
mainframes.  This  of  course  ensured  that  the  company  or  university  continued  to  buy  all  or 
most  of  their  computer  equipment  from  the  same  computer  numufacture. 

^th  the  growdi  of  die  minic  omputoa  and  the  woritstations  came  the  need  to  connect 
these  less  expensive  and  less  powerful  machines.  This  provided  the  nwtivation  and  the 
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driving  force  behind  the  development  of  Local  Area  Networks  (LAN).  There  w'ere  the 
proprietary  options  provided  by  the  computer  manufactures.  However,  with  the  need  to 
provide  connectivity  between  systems  came  the  desire  to  have  connectivity  between 
systems  from  different  manufacturers.  This  was  very  difficult  without  some  sort  of  agreed 
upon  standards.  In  the  late  1970s.  the  International  Standards  Organization  (ISO) 
developed  the  Open  Systems  Interconnection  (OSI)  reference  model  to  serve  as  the  basis 
for  future  open  networks.  This  model  would  provide  the  basis  for  computers  from  different 
vendors  to  be  able  to  communicate  with  each  other  [ALBE94]. 

Now  we  have  the  beginnings  of  connectivity  between  computers  and  the  beginnings 
of  smaller,  more  powerful  computers.  In  the  1980s,  Sun  Microsystems  started  producing 
their  line  of  desktop  workstations.  Within  a  few  years,  these  workstations  were  being  based 
on  new  Reduced  Instruction  Set  Gimputer  (RISC)  technology  which  allowed  Sun 
Microsystems  and  other  companies  to  produce  faster,  more  powerful  workstations.  Now  if 
we  combine  the  advancements  of  the  desktop  workstations  with  the  advancements  made  in 
networks,  we  have  the  true  beginnings  of  the  information  era. 

The  question  now  becomes  one  of  which  technology  is  advancing  faster.  Are  we 
producing  workstations  which  can  exceed  the  capabili^  of  the  networks  or  are  the 
networks  staying  ahead  of  the  abilities  of  the  workstations.  Also,  advancements  in 
workstation  technology  isn’t  just  limited  to  faster  hardware.  Is  the  operating  system  and  its 
networking  tools  keeping  pace  with  current  demands? 

It  is  clear  that  the  workstations  are  faster  and  more  powerful  than  in  the  past  It  is  also 
clear  that  the  networks  can  handle  more  data  at  faster  rates  than  in  the  past  But  where  do 
we  stand  if  we  compare  a  recently  released  product  produced  by  Sun  Microsystems  with 
one  of  the  current  high  speed  networks  such  as  Fiber  Distributed  Data  Interface  (FDDI)? 

B.  OBJECTIVE 

The  objective  of  this  thesis  will  be  to  measure  actual  throughput  between  high 
performance  workstations  over  an  FDDI  network  to  determine  what  bottlenecks,  if  any. 
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exits  between  Sun  Microsystem  SPARCstation^^  10  multiprocessors  running  Solaris^^' 

2.3  and  the  Network  Peripheral™  SBus  FDDI  Network  Interface  cards  and  to  evaluate 
Transmission  Control  Protocol/lntemet  Protocol  (TCP/IP)  as  a  high  speed  transport 
protocol.  This  process  will  require  an  analysis  of  the  workstations  being  used  in  this  study, 
an  understanding  of  current  network  operating  system  tools  and  measurements  of  data 
transfers  across  the  network  being  tested. 

This  is  not  simply  a  matter  of  reading  the  vendor’s  promotional  literature  and  seeing 
which  aspect  of  the  distributed  processing  environment  is  more  capable.  Vendors  normally 
promote  those  aspects  of  their  products  which  they  can  demonstrate  as  performing  at  or 
above  some  threshold.  This  threshold  may  or  may  not  be  value  to  the  consumer. 

C.  SCOPE,  LIMITATIONS  AND  ASSUMPTIONS 

The  scope  of  this  investigation  is  limited  to  performing  testing  and  tuning  at  the  level 
available  to  any  system  administrator.  No  modifications  are  made  to  any  hardware  or 
changes  made  to  the  workstation  kernel  which  are  not  considered  tunable  parameters.  From 
this  investigation,  a  determination  will  be  made  as  to  whether  or  not  there  are  any 
bottlenecks. 

It  is  assumed  that  the  changes  made  and  the  results  observed  on  the  SPARC  10 
multiprocessors  running  Solaris  2.3  can  be  extrapolated  to  other  vendor’s  hardware  and 
software.  If  we  note  that  changing  the  TCP/IP  window  size  on  our  workstations  results  in 
a  10  fold  increase  in  throughput,  then  we  assume  con^arable  results  would  be  observed  on 
other  vendor’s  workstations. 

D.  ORGANIZATION  OF  THESIS 

This  thesis  is  orgaiuzed  into  seven  chapters.  Dus  chapter  provides  the  introduction  and 
scope  of  work  to  be  performed.  Ouqners  D  and  ID  provide  a  background  on  networks  in 
general,  FDDI  specifically  and  the  specifics  on  the  workstations  involved  in  this 
investigation.  Chapters  IV  and  V  cover  the  methodology,  test  results  and  analysis  of  results. 
Chapter  VI  covers  what  conclusions  can  be  derived  from  these  results. 
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11.  NETWORK  PROTOCOLS 


A.  NETWORKING  THEORY 

The  primary  focus  behind  the  development  of  network  protocols  has  been  the 
organization  of  the  protocol  into  a  series^  of  layers.  This  has  allowed  the  design  of  the 
protocols  to  be  simplified  by  focusing  attention  at  each  layer  upon  that  layer's  function  and 
its  interaction  with  the  layers  above  and  below.  The  purpose  of  each  layer  is  to  offer  certain 
services  to  the  layer  above  without  the  higher  layer  needing  to  know  how  those  services 
were  provided. 

When  deigning  a  network  protocol  the  network  designer  must  determine  how  many 
layers  the  protocol  will  have,  what  those  layers  will  do  and  how  the  layers  will 
communicate  with  each  other.  This  last  decision,  deciding  how  the  layers  will 
communicate,  is  one  of  the  more  important  considerations.  A  clean-cut  interface  must  be 
defined  which  will  minimize  the  amount  of  infonnation  that  must  be  passed  between 
layers. 

The  set  of  layns  and  protocols  is  know  as  the  network  architecture.  Enough 
specification  must  be  given  for  each  layer  of  the  protocols  so  that  vendors  can  write  their 
versions  of  the  protocol  for  their  conq>utBr  architecture.  This  is  what  makes  the  network 
architectures  beneficial  to  everyone  accessing  a  network.  By  having  an  agreed  iqK>n 
network  architecture  that  everyone  is  willing  to  use,  we  can  have  distributed  processing 
ovo-  heterogeneous  processors  [MIN09 1]. 

B.  OPEN  SYSTEM  INTERCONNCETION 

The  Open  System  Intercoiuiection  (OSl)  reference  model,  Hgure  1,  was  proposed  in 
1978  to  promote  conqratilnlity  between  network  deagns.  This  model  was  approved  as  a 
standard  [ALBE94]  in  1983  by  the  International  Standards  Organization  (ISO).  The 
reference  model  is  not  a  protocol  or  set  of  rules  but  a  layering  of  required  functions,  or 
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services,  that  provides  a  framework  with  which  to  define  protocols.  In  practical  terms.  OSl 
is  seen  as  a  means  of  developing  communications  networks  which  are  not  restricted  by  the 
need  to  conform  to  a  rigid  set  of  manufactures'  proprietary  standards  and  protocols. 
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Figure  1:  ISO-OSl  Rcfeiencc  Model 

The  purpose  of  these  seven  layers  is  to  define  the  various  functions  that  must  be  carried 
out  when  two  machines  communicate.  Each  of  the  seven  layers  is  architecturally 
independent,  so  diat  the  relevant  protocols  and  service  functions  of  each  layer  can  be 
developed  independently.  The  seven  layers  of  the  model  can  be  roughly  divided  into  two 
parts;  the  first  four  layers,  physical  to  transport,  provide  die  telecommunications  functions 
and  operate  on  a  node-to-node  basis.  The  top  three  layers,  session  to  application,  are 
concerned  mainly  with  carrying  out  processing  functions  and  creating  a  meaningful  dialog 
between  the  user  and  the  triplication. 

Below  are  die  seven  layers  of  die  OSI  model  [STAL91]: 

•  Layer  1 :  Kiysical  Layer 

•  Layer  2:  Data  Link  Layer 

•  Layer  3:  Network  Layer 
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•  Layer  4:  Transport  Layer 

•  Layer  5:  Session  Layer 

•  Layer  6:  Presentation  Layer 

•  Layer  7:  Application  Layer 


C.  TRANSMISSION  CONTROL  PROTOCOL/INTERNET  PROTOCOL 

The  Transmission  Control  Protocol/lntemet  Protocol  (TCP/IP)  protocol  is  also 
structured  as  a  series  of  layers.  Each  layer  is  designed  for  a  specific  purpose.  They  are 
designed  so  that  a  specific  layer  on  one  machiiw  sends  or  receives  exactly  the  same  object 
sent  or  received  by  its  twin  on  another  machine.  This  is  done  without  regard  to  what  is 
going  on  in  layers  above  or  below  the  layer  under  conaderation. 

The  advantage  of  layering  is  that  it  simplifies  protocol  design.  The  designer  can 
concentrate  on  a  specific  layer  without  regard  to  the  design  of  other  layers.  For  example, 
when  designing  the  transport  layer  of  the  protocol,  die  engineer  need  be  concerned  only 
with  assuring  that  a  packet  received  by  one  machine  is  identical  to  the  packet  sent  by 
another.  The  message  contained  in  the  packet  is  of  no  concern.  The  integrity  of  the  message 
is  of  concern  only  to  die  designs  of  die  application  layer. 

Members  of  the  TCP/IP  family  include  the  Internet  Protocol  (IP),  Transmission 
(Tontrol  Protocol  (TCP),  User  Datagram  Protocol  (UDP),  Address  Resolution  Protocol 
(ARP),  Reverse  Address  Resolution  Protocol  QIARP),  and  the  Internet  Control  Message 
Protocol  (ICMP).  The  entire  family  may  be  referred  to  as  TCP/IP,  reflecting  the  names  of 
the  two  main  protocols. 

The  OSl  model  describes  an  idealized  network  communications  model.  TCP/IP  does 
not  corre^nd  to  this  model  at  every  level,  Imt  instead  either  combines  the  functions  of 
several  OSl  layers  into  a  single  layer,  or  finds  no  med  to  make  use  of  certain  layers.  In 
consequence,  TCP/IP  can  be  described  by  a  simpler  model  as  shown  in  Hgure  2  [STEV94]. 
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1.  Link  Layer 


The  Link  layer  is  the  hardware  level  of  the  protocol  model.  It  specifies  the 
physical  connections  between  hosts  and  networks,  and  the  procedures  used  to  transfer 
packets  between  machines. 


Application 

Telnet  FTP,  e-mail,  etc. 

Tranqiort 

TCP,  UDP 

Network 

P,  ICMP.  IGMP 

Link 

device  driver  and  interface  card 

Hgure  2:  The  Four  Layers  of  the  TCP/IP  Protocol  Suite 


2.  Network  Layer 

This  layer  is  responsible  for  machine-to-machine  communications.  It  determines 
the  padi  a  transmission  must  take,  based  on  die  receiving  machine's  IP  address.  The 
network  layer  also  provides  transmission  formatting  services;  it  assembles  data  for 
transmission  into  an  internet  datagram.  If  the  datagram  is  outgoing  (received  from  the 
higher  layer  protocols),  die  network  layer  attaches  an  IP  header  (Hgure  3)  to  it  This  header 
contains  a  number  of  parameters,  most  significandy  die  P  addresses  of  the  sending  and 
receiving  host  Other  parameters  include  datagram  length  and  identifying  informadon,  in 
case  the  datagram  exceeds  the  allowable  byte  size  for  network  packets  and  must  be 
fragmented. 

3.  Transport  Layer 

The  tran^xirt  layer  protocols  enable  connmunications  between  application 
programs  running  on  separate  machines.  The  transport  layer  assures  that  data  arrives  in 


7 


sequence,  und  without  error.  It  does  so  by  swapping  acknowledgments  of  data  reception, 
and  the  retransmission  of  lost  packets.  This  type  of  communication  is  known  as  “end-to- 
end".  Protocols  at  this  level  are  TCP,  UDP,  and  ICMP. 
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Figure  3:  IP  Header 


TCP  attaches  a  header  onto  the  transmitted  data.  This  header  contains  a  large 
number  of  parameters,  see  Hgure  4,  which  help  processes  on  the  sending  machine  connect 
to  peer  processes  on  the  receiving  machine.  TCP  uses  16  bit  port  numbers  as  its  addressing 
method.  Servers  are  normally  know  by  their  well-known  port  number.  For  exanq)le,  every 
TCP/IP  implementation  that  provictes  an  FTP  server  provides  that  service  on  TCP  port  21. 
Every  Telnet  server  is  on  TCP  port  23  [STEV94). 

4.  Application  Layer 

The  plication  layer  lets  you  use  various  TCPAP  standard  internet  services. 
These  services  work  with  the  next  lowest  level  of  protocols  (transport)  to  send  and  receive 
data.  These  services  include  telnet,  rq?,  and  the  Domain  Name  Service  (DNS). 

telnet  The  Telnet  protocol  enables  terminals  and  terminal  oriented  processes  to 
communicate  on  a  network  running  TCP/IP. 
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ftp.  ftp  transfers  files  to  and  from  a  rnnote  network.  Unlike  rep,  ftp  works  even 
when  die  remote  computer  is  running  a  non-UNIX  operating  system.  A  user  must  "‘log  in” 
to  the  remote  computer  to  make  an  ftp  connection  unless  a  system  administrator  has  set  up 
the  computer  to  allow  “anonymous  ftp”. 

rep.  rep  copies  one  or  more  fries  or  hierarchies  to  and  from  a  remote  computer. 
The  remote  computer  must  be  running  UNIX.  C^e  must  be  ah  accepted  user  of  the  remote 
computer  (i.e.,  the  user’s  name  must  be  in  the  remote  computer’s  password  database,  and 
the  user’s  machine  name  must  be  listed  in  the  remote  .rhost  file).  If  this  is  not  the  case,  a 
user  cannot  copy  anything  to  or  from  the  remote  machine.  The  user  must  know  the 
complete  pathname  of  the  file  or  directory  to  be  copied. 

DNS.  DNS  provides  host  names  to  the  IP  address  service.  It  is  a  distributed 
database  that  is  used  by  TCP/IP  triplications  to  map  between  hostnames  and  IP  addresses. 
The  DNS  jvovides  the  protocol  that  allows  clients  and  servers  to  coirununicate  with  each 
other  and  to  provide  electronic  mail  routing  information. 
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D.  FIBER  DISTRIBUTED  DATA  INTERFACE 


1.  Fiber  Distributed  Data  Interface  Basics 

Hber  Distributed  Data  Interface  (FDDI)  is  a  100  Mbps  high  speed  LAN  standard 
developed  under  the  auspices  of  American  National  Standards  Institute  (ANSI)  X3T9.S 
committee.  FDDI  was  developed  to  create  a  reliable  fault-tolerant,  high-speed  network 
connecting  numerous  stations  over  greater  distances  than  existing  standards.  Although 
FDDI  is  somewhat  similar  to  the  IEEE  802  standards,  it  is  not  part  of  that  family  of 
standards  [MIN09 1  ] . 

The  ANSI  X3T9.5  committee  developed  specifications  for  a  network  based  on  a 
dual  counter-rotating  fiber  optic  ring  using  a  timed-token  protocol,  which  is  capable  of 
transmitting  data  at  100  Mbps  in  each  ring  and  which  can  extend  to  500  stations  ovct  total 
fiber  length  of  200  km  with  full  system  perfoimance.  The  dual  counter-rotating  ring  can 
support  connections  up  to  2  km  with  multimode  fiber  and  connections  up  to  60  km  using 
single-mode  fiber. 

The  FDDI  standard  allows  for  two  types  of  traffic;  synchronous  and 
asynchronous.  Synchronous  traffic  should  consist  of  data  which  is  time  sensitive  such  as 
voice  or  interactive  video.  Any  delay  in  the  throughput  of  this  traffic  has  an  adverse  affect 
of  the  quality  of  the  data  being  transferred.  Asynchronous  traffic  should  consist  of  more 
routine  data  transfers  such  as  email,  file  transfers  and  Network  File  System  (NFS)  or 
Network  Information  Service  (NIS)  traffic.  These  packets  of  data  can  sustain  some 
reasonable  delays  in  transmission  without  any  adverse  affects  on  the  applications. 

2.  Fiber  Distributed  Data  Interface  Layers 

The  standard  for  FDDI  developed  by  the  X3T9.5  committee  included  four  layers 
shown  in  Figure  5.  They  are  the  Media  Acc^  control  (MAQ  layo*,  the  Riysical  (PHY) 
layer,  the  Hiysical  Medium  Dependent  (PMD)  layer,  and  the  Station  Management  (SMT) 
document  [ALBE94]. 
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Figure  5:  Relationship  Between  FDDI  and  ISOOSI  Layers 

The  four  layers  of  FDDI  fall  under  the  first  two  layers  of  the  OSI  Model.  The 
physical  layer  of  FDDI  is  qiecified  in  two  documents:  the  FDDI  PMD  which  defmes  the 
optical  interconnecting  conqranents  used  to  foim  links  and  the  FDDI  PHY  which  defines 
the  encoding  scheme  used  to  represent  data  and  control  symbols.  The  DLL  is  also  divided 
into  two  sublayers:  A  MAC  and  LLC  layer.  The  MAC  portion  provides  access  to  the 
medium,  address  recognition,  and  generation  and  verification  of  frame  check  sequences. 
The  LLC  specification  is  not  part  of  the  FDDI  standard  [M1N091]. 

Below  in  Figure  6  is  an  additional  graphical  representation  of  the  interaction 
between  the  FDDI  standards  as  described  in  [POWE93]. 

a.  The  Physical  Medium  Dependent  Layer 

This  layer  defines  all  transmitters,  receivers,  cables,  connectors  and  otho- 
physical  media  and  hardware.  There  are  currently  6  media  options  provided  for  die  PMD 
layer 
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•  Multimode  fiber  (PMD) 

•  Siniile-mode  fiber  (SMF-PMD) 

•  Low-cost  fiber  (LCF-PMD) 

•  Shielded  twisted  pair  (STP-PMD) 

•  Unshielded  twisted  pair  (UTP-PMD) 

•  FDDI  on  Synchronous  Optical  Network  (SONET) 


IEEE  P802.2  LLC 


Hber  out  Fiber  in 


Figure  6:  Block  Diagram  of  the  FDDI  Layers 


The  first  three  options  are  published  or  soon  to  be  published  standards.  The 
last  three  options  are  under  development  [ALBE94]. 

The  PMD  layer  provides  the  PHY  layer  all  the  services  required  to  transport 
a  coded  bit  stream  from  one  node  to  the  next  node.  It  converts  the  encoded  data  requests 
from  the  PHY  layer  into  either  optical  or  electrical  signals  depending  on  the  media  being 
used.  It  also  provides  SMT  with  the  needed  sorvices  required  for  proper  ring  management 
The  PMD  layer  informs  both  the  SMT  and  PHY  layers  whenever  it  detects  a  signal  on  the 
medium  [ALBE94]. 
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b.  The  Physical  Layer 

This  layer  provides  media  independent  functions  associated  with  the  OS  I 
physical  layer.  The  PHY  layer  decodes  incoming  bit  stream  into  a  symbol  stream  for  use 
by  the  MAC  layer  and  it  encodes  the  data  and  control  symbols  provided  by  the  MAC  layer 
for  transmission  via  the  PMD  layer.  The  PHY  layer  continuously  monitors  the  ring  status 

by  listening  to  incoming  signals  and  passes  this  information  onto  the  SMT  layer  [ALBE94]. 

« 

c.  The  Media  Access  Control  Layer 

This  layer  provides  fair  and  deterministic  access  to  the  network.  The  access 
is  fair  because  a  workstation’s  physical  location  does  not  give  it  any  advantage  in  accessing 
the  medium  over  another  workstation’s  location.  The  service  is  deterministic  implies  that 
the  time  the  workstation  has  to  wait  for  the  token  can  be  predicted  under  error  free 
conditions. 

In  FDDI,  medium  access  is  controlled  by  a  token.  The  workstation  which 
possesses  the  token  can  transmit  frames.  The  other  workstations  on  the  network  repeat  the 
frame,  and  the  destination  workstation  copies  the  frame  in  addition  to  repeating  it.  The 
MAC  layer  of  the  workstation  which  generated  the  frame  is  responsible  for  removing  the 
frame  and  passing  the  token  downstream  to  the  next  workstation  when  it’s  Token  Holding 
Time  (THT)  has  expired  [ALBE94]. 

d.  The  Station  Management  Layer 

The  SMT  layer  provides  services  such  as  node  initialization,  bypassing  faulty 
nodes,  coordination  of  node  insertion  and  removal,  fault  isolation  and  recovery  and 
collection  of  statistics.The  SMT  layer  provides  these  functions  using  services  provided  by 
the  PMD,  PHY  and  MAC  layers. 
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3.  Fiber  Distributed  Data  Interface  Framing 


Most  communications  within  roDl  is  done  on  frames  (Except  Physical 
Connection  Management  (PCM)  signaling).  Within  the  MAC  layer  there  are  three  frame 
types: 

•  Tokens 

•  Management  frames 

•  Data  frames 


Each  frame  is  made  up  of  three  parts.  The  first  part  is  the  start  of  the  frame 
sequence.  The  next  part  is  the  data  or  information  part  of  the  fr^e.  The  last  part  is  the  end 
of  the  frame  sequence.  The  data  frame  is  shown  in  Figure  7  along  with  the  size  of  each  field 
in  symbols  [ALBE94]. 
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Sizes  are  in  symbols 
1  symbol  s  4  bits 


Total  frame  (minus  infonnation)  size: 
40  symbols  *  4  bits  /  8  bits  «  20  bytes 


Enor 

Detected 

Address 

Recognized 

Frame 

Copied 

1  1  1 

Figure  7:  FDDI  Frame  Format 


The  Stan  pan  of  the  frame  is  28  symbols  in  length.  Each  symbol  is  a  4  bit  unit 
This  means  the  stan  portion  of  the  FDDI  frame  is  28  symbols  *  4  bits  /  8  bits  =  14  bytes 
long.  The  end  portion  of  the  FDDI  firame  is  12  symbols  or  6  bytes  long.  Since  the  maximum 
frame  length  is  9,000  symbols  or  4,500  bytes,  this  leaves  4,480  bytes  available  for  data  or 
information.  This  remaining  portion  of  4,480  bytes,  is  also  know  as  the  FDDI  Maximum 
Transfer  Unit  (MTU)  value  [ALBE94]. 


14 


4.  Encoding  Method 

DigitaJ  data  needs  to  be  encoded  for  proper  transmission.The  type  of  encoding 
used  is  determined  by  the  type  of  media  being  used,  the  desired  data  rate,  noise  present  on 
the  transmission  media  and  other  factors.  Since  FDDI  was  originally  intended  for  use  over 
fiber  optics,  the  encoding  method  selected  needed  to  provide  a  digital-to-analog  capability. 

FDDI  u.ses  a  two-stage  encoding  scheme;  4B/5B  group  encoding  along  with  the 
digital  signal  encoding  method  known  as  Non-Rctum  to  Zero  Inverted  (NRZI).  NRZI  is  an 
example  of  differential  encoding.  The  signal  is  decoded  by  comparing  the  polarity  of 
adjacent  signal  elements  rather  than  determining  the  absolute  value  of  a  signal  element  In 
4B/SB,  the  encoding  is  done  4  bits  at  a  time  resulting  S  encoded  bits.  Then,  each  element 
of  the  4B/5B  stream  is  treated  as  a  binary  value  and  encoded  using  NRZI. 

The  result  is  that  FDDI  is  able  to  achieve  a  100  Mbps  throughput  using  a  125- 
MHz  rate.  As  mentioned  earlier,  the  PHY  layer  is  responsible  for  decoding  the  4B/5B 
NRZI  signal  from  the  network  into  symbols  that  can  be  recognized  by  the  station.  The 
synchronization  is  derived  from  the  incoming  signal  and  the  data  are  then  retimed  to  an 
internal  clock  through  an  elasticity  buffer. 

E.  NETWORK  OVERHEAD 

The  process  of  transferring  data  from  one  workstation  to  another  involves  all  the  layers 
of  protocols  described  previously.  Even  though  the  protocols  are  broken  into  layers  to 
distribute  functionality,  the  result  is  increased  overhead.  As  discussed  earlier,  for  each  layer 
of  protocol,  there  is  an  associated  overhead  at  that  layer  as  shown  in  Figure  8. 
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Maximum  FDDI  firame;  4500  bytes 

Maximum  data  transferred:  4440  bytes 

Percentage  of  overhead:  1.33% 


Figure  8:  Compoation  of  FDDI  Frames  and  Percentage  of  Overhead 
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The  amount  of  overhead  involved  in  transferring  data  is  dependent  upon  the  protocols 
used  and  the  network  media  being  used  as  the  transfer  agent.  For  FDDl.  the  overhead  is 
calculated  as  follows: 


Data 

Overhead 

Level 

Total  Overhead 

4.440  bytes 

0 

Application 

0  bytes 

4,440  bytes 

20  bytes 

TCP  . 

20  bytes 

4,440  bytes 

20  bytes 

IP 

40  bytes 

4,440  bytes 

20  bytes 

FDDI 

60  bytes 

In  this  example,  the  frame  of  data  being  sent  is  4,500  bytes:  total  amount  of  data  being 
transferred  is  4,440  bytes  and  total  amount  of  overhead  is  60  bytes.  Therefore,  the 
percentage  of  overhead  is  the  amount  of  overhead  (60  bytes)  divided  by  the  total  frame  size 
(4,500  bytes).  Oveiiiead  =  60  bytes  /  4,500  bytes  =  1 .33%.  If  we  were  to  only  send  1 1  bytes 
of  data,  then  the  overhead  would  be  60  bytes  /  71  bytes  =  84.5%.  It  is  clear  that  the  more 
data  sent  in  each  FDDI  frame,  the  lower  die  percentage  of  overhead  associated  with  that 
frame.  Note  that  in  this  example  the  overhead  from  the  application  layer  was  not  included. 

F.  FIBER  DATA  DISTRIBUTED  INTERFACE  PARAMETERS 

This  section  wiU  give  a  brief  explanation  of  FDDI  parameters  as  covered  in  the  ANSI 
standards.  The  MAC  layer  must  implement  a  number  of  the^s  parameters  as  timers  and 
counters.  The  three  main  goals  of  these  timers  and  counters  are  to  [ALBE94]: 

•  Allow  the  initialization  of  the  token  rotation  timer 

•  Permit  fast  recovery  from  ring  errors 

•  Aid  in  the  collection  of  ring  statistics  for  SMT 

Below  in  Figure  9  are  a  list  of  the  inportant  timer  values  and  variables  used  in  the  data 
transmission  process.  According  to  the  FDDI  standards,  every  time  a  node  releases  a  token, 
it  loads  the  value  of  T_Opr  into  Token  Rotation  Tuner  (TRT).  This  timer  then  decrements 
until  it  reaches  zero.  If  it  reaches  zero  before  a  valid  token  is  received,  the  token  is  said  to 
be  late  and  the  late  counter  (JLatejCt)  is  incremented.  If  TRT  expires  a  second  time  before 
a  valid  token  is  received,  an  error  condition  exists  and  recovery  procedures  are  initiated. 
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The  token  holding  timer  (THT)  is  used  to  control  asynchronous  transmission  in  a  dynamic 
manner.  When  a  valid  token  is  received  and  the  LatejOt  is  not  set.  the  token  is  said  to  be 
early  and  the  node  may  transmit  asynchronous  data.  In  this  case,  THT  is  set  to  T  Opr  minus 
TRT  and  the  node  may  transmit  until  THT  expiries.  TVX  is  a  hardware  backup  timer  that 
is  used  to  prevent  nodes  from  blabbering  on  the  network  due  to  sonw  error  or 
miscalculation  of  THT  [ALBE94]. 


Parameter 

Description 

TTRT 

Target  token  rotation  time 

TRT 

Tdcen  rotation  timer 

TjOpr 

(iterative  TTRT  negotiated  during  claim  process 

Late  Cl 

Late  counter 

THT 

Token  holding  timer 

TVX 

Transmission  valid  timer 

Figure  9:  Timers  and  Counters  Used  in  Data  Transmission 
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HL  NETWORK  EQUIPMENT 


A.  NETWORK  OVERVIEW 

The  Naval  Postgraduate  School  (NPS)  FDDI  research  network  consist  of  the  three 
OMchines  operating  on  a  ring.  The  names  of  the  three  machines  on  the  FDDI  LAN  are 
“Black”,  “White”  and  “Gold”.  Gold  is  the  server  on  the  network.  The  network  is  setup  as 
shown  in  Figure  10. 


1.  Fiber  Optics  Equipment 

The  specifications  for  the  fiber  optics  equipment  can  be  found  in  the  PMD 
standards.  Originally,  only  optical  fiber  was  specified  as  u  physical  media  for  FDDl.  Now 
it  is  possible  to  also  use  shielded  twisted-wire  for  short-distance  transmissions.  The 
requirements  for  twisted-wire  can  be  found  in  the  STP-PMD  standards. 

The  recommended  fiber  size  for  FDDl  is  62.5/125  u  m.The  operating  wavelength 
is  specified  as  1300  nm  and  die  minimum ‘allowable  power  for  the  transmitter  is  -16  dBm. 
Pin  diodes  are  to  be  used  in  the  link.  Pin  diodes  were  chosen  over  avalanche  photodiodes 
since  pin  diodes  are  a  more  mature  technology  and  would  result  in  a  lower  cost  receiver. 

The  bit-error  rate  (BER)  of  the  network  is  4  x  10'^^  and  the  maximum  number  of  nodes  is 
500  [POWE93]. 

2.  N^orfc  Peripherals*  Interfoce. 

The  Network  Peripherals  Inc.  (NPI)  SBus  FDDl  Network  Interface  conforms  to 
Sun  Microsystems’  requirements  for  an  SBus  adapter.  It  mounts  in  a  SBus  slot  and 
implements  burst  tixxle  Direct  Memory  Access  (DMA)  for  the  highest  system  performance 
[NPI93]. 

As  stated  earlier,  FDDl  is  designed  to  provide  the  capability  for  both  synchronous 
and  asynchronous  data  transfer.  This  is  not  die  case  with  NPFs  SBus  FDDl  Interface  card. 
Furthermore,  it  is  not  the  case  for  all  known  current  implementations  of  FDDl.  This  makes 
the  relationship  of  the  timers  and  counters  described  earlier  not  as  well  defined.  Without 
synchronous  and  asynchronous  transfers,  ttoe  is  no  need  for  Late_Ct  and  THT.  Below  is 
a  list  of  parameters  which  NPI  list  as  its  tunable  parameters.  Note  that  there  is  not  a 
parameter  listed  here  which  specifies  how  long  a  node  can  maintain  the  token. 

sbfjuunJlcjrx  /•  For  LLC  netwoik  traffic: 

/*  numba*  of  4k  receive  buffers,  maximum  is  64  4k  buffers 
/*  Default  is  48  4k  buffers  per  NP-SB  ad^ter 

sbfjuunjmtjx  t*  For  SMT  network  traffic: 
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/*  number  of  4k  receive  buffers,  maximum  is  64  4k  buffers 
/*  Default  is  4  4k  buffers  per  NP-SB  adapter 

sbfjntu  /*  Maximum  protocol  packet  size,  default  is  4352  bytes 

sbf_T_Notify  /*  SMT  Neighbor  NotiAcation  Timer,  default  is  30  seconds 

sbfjtumjncast  I*  number  of  multicast  entries,  default  is  16 

These  parameters  can  be  tuned  by  entering  the  ^propriate  line  below  in  / 
etc/system  for  each  parameter. 

1.  To  change  number  of  receive  buffers  to  64; 
set  sbf:sbf_num_Uc_rx  =  64 

2.  To  change  MTU  size  to  4192  bytes; 
set  sbf:sbfjntu  =  4192 

3.  To  change  T_Notify  timer  to  10  seconds; 
set  sbf:sbf_T_Notify  ®  10 

After  contacting  NPI  it  was  learned  that  there  is  another  parameter  which  is  not 
advertised  called  t_req.  This  parameter  determines  how  long  the  node  is  allowed  to  ho<'' 
the  token. 

3.  Silicon  Graphic’s  Interface 

FDDIXPress^  3.0.1  is  a  network  interface  controller  (board  and  software) 
providing  FDDI  connectivity  for  Silicon  Gr^hics  workstations  and  servers.  For  the  IRIS 
Indigo.  FDDDCPress  has  two  configurations  of  the  FDDI  board;  FDDDCPl  and  FDDDCPID. 
The  FDDDCPI  board  allows  one  single-attachment  FDDI  connection  to  an  FDDI 
concentrator;  the  FDDIXPID  board  provides  a  dual-attachment  FDDI  connection  directly 
to  the  dual  ring,  or  one  or  two  connections  to  an  FDDI  concentrator.  An  Indigo  can 
accommodate  one  of  these  boards. 
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When  FDDIXPress  is  installed,  an  Indigo  can  also  use  its  built-in  Ethernet 
network  interface,  thus  having  two  network  interfaces.  FDDIXPress  for  IRIS  Indigo  has 
been  designed  for  custonrer  installation. 

B.  WORKSTATION  OVERVIEW 

1.  SUN  SPARCstation  10  system 

The  SPARCstation  10  systems  used  in  this  test  were  the  new  multiprocessing 
systems  running  Solaris  2.3;  We  had  two  SPARCstation  10  systems.  Gold  and  White, 
available  for  our  FDDI  research.  Both  systems  have  two  processors,  two  internal  hard  disk 
drives  and  224  Dynamic  Random  Access  Memory  (DRAM).  Gold  has  two  50MHz 
processors  and  2  *  1  GB  internal  drives.  White  has  two  40MHz  processors,  1-1  GB  internal 
drive  and  1-425  MB  internal  drive. 

a.  Software  Architecture 

Solaris  2.3  is  a  multilayered  operating  system  that  includes  SunOS  5.3,  Open 
Network  Computing  (ONQ,  Op«i  Windows,  and  the  DeskSet  At  the  core  of  Solaris  is 
SunOS,  the  collection  of  programs  that  acnially  manages  the  system,  which  includes  the 
kernel,  the  file  system,  and  the  shells. 

SunOS  is  a  collection  of  UNIX  programs  drat  control  the  Sim  workstation  and 
provide  a  link  between  the  user,  the  workstation,  and  its  resources.  It  has  its  roots  firmly 
placed  in  the  two  rirast  popular  UNIX  fiunilies:  Berkeley  UNIX  (BSD)  and  AT&T’s  UNIX. 
Early  versions  of  SunOS  blended  some  of  AT&T’s  UNIX  with  Berkeley  UNIX  and  offered 
additional  enhancements. 

AT&T  and  Sun  Microsystnns  later  worked  togedter  to  create  a  new  industry 
staiKiard,  AT&T  UNIX  System  V  Release  4,  commonly  known  as  SVR4.  SimOS  5.3 
merges  SunOS  4.1  and  SVR4.  Most  of  die  new  changes  in  SunOS  come  from  SVR4.  As  a 
result,  Solaris  2.3  is  based  on  SVR4  but  contains  a  few  additional  BSD/SimOS  features 
[HESL93]. 
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b.  Hardware  Architecture 


The  SPARCstation  10  architecture  is  shown  inPigure  1 1  [SUNM90J: 

SuperSPARC  microprocessor  This  is  a  high-performance  CPU  chip  that 
has  the  following  features: 

•  A  single  chip  with  integer,  floating  point,  memory  management,  and  caches. 

•  Superscalar  pipeline  with  up  to  thr^  instructions  launched  per  clock  cycle. 

•  20- Kbyte  instruction  cache  and  16-Kbyte  data  cache. 

•  64  entry  TLB  with  hardware  page-table  walking. 

•  Integral  support  for  cache-coherent  multiprocessing. 

The  SuperSPARC  processor  has  a  companion  chip,  the  SuperCache 
controller,  which  provides  for  a  1 -Mbyte  external  cache.  Additionally,  SPARC  modules 
with  SuperCache  controllers  can  operate  asynchronous  to  the  system  clock. 

MBus.  The  MBus  is  a  high  performance  memory  bus  which  was  first 
introduced  in  Sun’s  SPARCscrver  600MP  family.  It  is  a  synchronous,  40-MHz  64-bit  bus 
that  is  capable  of  a  peak  transfer  rate  of  320  Mbytes/secood.  Typically,  the  MBus  can 
sustain  a  rate  of  100  Mbytes/second. 

This  bus  provides  support  for  symmetric  multiprocessing  by  nwans  of  a 
“snooping”  protocol.  Whenever  a  processor  puts  an  address  onto  the  MBus,  all  other 
processors  “snoop”  the  bus,  checking  to  see  if  data  at  the  snooped  address  is  in  their  cache. 

Main  memory  architecture:  The  Sun-4m  architecture  uses  a  144  bit  wide 
memory  data  path  (128  bits  of  data  and  16  bits  of  error  detection  and  correction).  The  use 
of  a  128-bit  wide  memory  data  has  two  advantages.  First,  the  32-byte  cache  nil  can  be 
accomplished  quickly.  Second,  error  corrections  can  be  performed  on  each  64-bit  word. 
Single  bit  errors  can  be  corrected  and  double-bit  (4-bit)  errors  can  be  detected. 
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Figure  11:  Sun>4in  Architecture  Used  in  the  SPARCstation  10  System 


I/O  architecture:  A  single  Application-Specific  Integrated  Circuit  (ASIC) 
serves  as  the  interface  between  the  MBus  and  the  SBus.  The  MBus  is  used  as  the  processor 
memory  interconnect,  while  the  SBus  is  used  only  for  I/O.  The  SPARCstation  10  system 
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supports  four  SBus  slots.  They  provide  the  means  to  interface  a  variety  of  I/O  options, 
including  network  interfaces  such  as  FDDl,  graphics  adapters  and  laser  printer  interfaces. 

2.  Silicon  Graphics  IRIS  Indigo 

The  Silicon  Graphics  IRIS  Indigo  used  in  this  test  was  an  IRIS-4D^^,  model  4D/ 
RPC.  The  IRIS  Indigo  uses  the  R3000A  CPU  RISC  processor  from  MIPS  Computer 
Systems  Inc.  It  is  assisted  by  a  32  Kbyte  data  and  instruction  cache  and  a  MIPS  R3010A 
floating-point  unit  To  speed  up  data  transfers,  IRIS  Indigo  uses  custom  ASICs  designed 
by  Silicon  Graphics.  These  chips  manage  memory  and  processor  interrupts,  handle  I/O  and 
control  the  bus,  often  without  CPU  intervention  [SILIC91]. 

We  had  one  IRIS  Indigo,  Black,  avaUable  for  our  FDDI  research.  This  system  has 
one  33  MHz  processor,  one  1  GB  internal  hard  disk  drive  and  32  Mbytes  of  RAM.  The 
workstation  has  the  following  features: 

•  A  single  33  MHz  chip  with  integer,  floating  point  memory  management  and 
caches. 

•  32-Kbyte  instruction  cache  and  32-Kbyte  data  cache. 

•  Integral  support  for  cache-coherent  multiprocessing 

a.  Software  Architecture. 

The  IRIS  Indigo  uses  IRIX  4.0  which  is  Silicon  Graphics’  implementation  of 
the  UNDC  operating  system.  IRDC  4.0  is  based  on  AT&T  UNIX  System  V.3,  but  also 
includes  numerous  4.3  BSD  extensions,  such  as  TCP/IP  network  protocols  and  NFS,  which 
provide  transparent  access  to  fries  across  a  heterogeneous  network 

b.  Hardware  Architecture. 

This  IRIS  indigo  CPU  board.  Figure  12  [SILIC91],  contains  four  functional 

sections: 

•  The  processor  core,  which  contains  the  CPU  and  FPU. 

»  Main  memory,  which  contains  DRAM  and  supporting  circuitry 

•  The  I/O  system,  which  conuuns  peripheral  ports  and  hardware  designed  to  read 
incoming  data,  manage  incoming  and  outgoing  data 

•  The  audio  system,  which  contains  audio  ports  and  digital  signal  processi.ig 
hardware. 
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Hgure  12:  The  IRIS  Indigo  CPU  Board 


Three  busses  connect  parts  of  the  CPU  board: 

The  CPU  bus,  which  connects  the  CPU,  FPU,  cache  control,  and  bus  control 
hardware. 

The  GI032  bus,  which  is  the  main  system  bus  connecting  the  processor  core. 
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main  memory,  I/O  system,  expansion  slots,  and  graphics  board. 

•  The  Peripheral  bus,  which  connects  the  peripheral  ports,  audio  system,  and 
other  I/O  components. 

The  CPU  bus  and  the  GI032  bus  have  separate  clocks  and  run  at  different 
speeds  so  that  each  part  runs  at  maximum  capability.  The  .CPU  and  other  chips  can  be 
upgraded  independently  as  technology  improves. 

Instruction  and  Data  Caches.  Each  cache  is  a  32  Kbyte  cache.The 
instruction  cache  holds  frequently  used  instructions  and  the  data  cache  holds  frequently 
used  data.  The  IRIS  Indigo  uses  a  write-through  scheme  in  the  data  cache  to  ensure  that 
writes  made  to  the  cache  are  also  written  to  the  corresponding  page  in  main  memory. 

The  GI032  Bus.  This  bus  is  the  IRIS  Indigo’s  main  system  bus,  and  is 
designed  for  high  speed  data  transfer.  It  conn^ts  the  main  systems  of  IRIS  Indigo;  the 
processor  core,  main  memory,  the  I/O  systems,  the  graphics  system,  and  any  systems 
plugged  into  the  expansion  slots.This  bus  is  a  synchronous,  multiplexed  address/data,  burst 
mode  bus  that  operates  at  33.3  MHz,  clocked  independently  of  the  CPU.  The  bus  protocol 
supports  data  transfers  at  a  maximum  sustained  rate  of  one  word  per  clock. 

The  I/O  System.  The  I/O  system  ties  together  a  variety  of  I/O  ports  and  the 
chips  that  drive  them,  a  system  clock,  system  ftogrammable  Read-Only  Memory  (PROM) 
for  booting  up,  an  static  RAM. 

The  HPCl  ASIC.  The  HPCI  is  a  custom  Silicon  Graphics  chip  that  connects 
to  the  GI032  bus,  the  peripheral  bus,  and  directly  to  several  of  the  I/O  ports.  It  is  the  hean 
of  the  I/O  system,  and  quickly  transfers  data  between  main  memory  and  a  rich  collection 
of  peripheral  devices. 

Expansion  Slots.  The  two  expansion  slots,  connected  directly  to  the  GI032 
bus,  provide  direct  access  to  the  system  for  Silicon  Graphics  and  third  party  plug-in  boards 
for  such  applications  as  high-speed  networking,  image  compression,  video  deck  control, 
and  additional  I/O.  Slot  0  is  used  for  our  FDDl  connection. 
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IV.  TEST  DESIGN  PLAN 


A.  TEST  STRATEGY 

The  objective  is  to  find  the  upper  limit  of  throughput  by  measuring  actual  throughput 
between  high  performance  workstations  pver  an  FDD!  network  and  to  determine  what 
bottlenecks,  if  any,  exits  between  Sun  Microsystem  SPARC  10  multiprocessors  running 
the  Solaris  2.3  and  NPI’s  FDDl  network  interface  cards.  This  process  will  include 
identifying  the  various  paranteters  which  affect  throughput  and  testing  these  parameters  in 
enough  detail  to  determine  their  impact  on  network  performance.  As  explained  in  Chapter 
D,  there  are  various  levels  of  software  that  are  involved  in  transferring  data.  As  shown  in 
Figure  13,  as  data  is  transferred  from  White  to  Gold,  there  are  several  impacts  on  the  data 
transfer  rate. 

The  key  to  this  test  design  plan  will  be  gathering  the  appropriate  data  to  determine 
what  impact  these  various  parameters  have  on  the  transfer  rate,  and  how  to  measure  them. 
Three  different  methods  will  be  used  to  measure  the  performance  of  data  being  transferred 
between  workstations  across  the  FDDI  network.  First,  a  commercial  benchmarking  tool 
will  be  used  to  provide  performance  results  on  the  workstations.  Second,  a  public  domain 
networking  benchmark  tool  will  be  used  to  show  the  transfer  rate  of  the  network.  Third,  a 
simple  program  which  issues  an  rep  command  and  measures  the  time  of  the  file  transfer 
will  be  used. 

B.  NEAL  NELSON  BENCHMARK 

The  primary  benchmarking  tool  to  be  used  for  providing  the  performanoe  results  on 

the  workstations  will  be  the  Neal  Nelson  Business  Benchmark^.  This  benchmark  tool  has 
been  around  for  over  9  years  and  has  been  used  as  a  tool  for  verifying  vendor  compliance 
during  government  contract  awards.  The  Business  Benchmark  differs  from  other  ptqmlar 
benchmarks  in  that  its  primary  focus  is  not  to  provide  a  single  number  ^[leed  rating  for  a 
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system,  nor  is  its  primary  purpose  to  emulate  a  particular  user  group  or  duplicate  the  load 
created  by  certain  task  mix.  The  Business  Benchmark  was  designed  to  incrementally  stress 
various  parts  of  a  computer  system  and  record  how  the  system  performs.  The  benchmark 
was  intended  to  uncover  both  the  strengths  and  the  weaknesses  of  a  computer  architecture 
and  report  them  separately  so  that  they  can  be  understood  and  analyzed  [GRAY91]. 


Figure  13:  Flow  of  Data  Across  the  FDDI  Netwoiit  Using  the  RCP  Command 


The  Neal  Nelson  Business  Benchmark  is  a  multitasking  benchmark  with  a  parent/child 
design.  A  parent  process  creates  child  processes  and  instructs  them  to  ran  tests  in  various 
combinations.  There  can  be  from  one  to  one  hundred  child  processes  running 
simultaneously  during  a  benchmark  session.  During  a  test  session  the  parent  process  creates 
a  single  child  process  and  instructs  the  child  to  perform  a  senes  of  tests.  Then  the  parent 
creates  a  second  child  and  directs  both  children  through  the  same  series  of  tests.  This 
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process  is  repeated  untii  a  desired  maximum  number  of  child  processes  is  reached,  or  until 
the  system  runs  out  of  some  resource  such  as  disk  space  [NNBM94]. 

Ute  benchmark  consists  of  thiny  tests,  which  are  divided  into  three  groups. 


Group  1:  Tests  a  of  mix  of  activities  that  are  intended  to  approximate  the  processing 
activities  for  the  following  five  types  of  users.  Group  1  includes  the  following  tests: 


1)  Simulated  Office  Automation  Workload 

2)  Simulated  Database  Workload 

3)  Simulated  Software  Development  Workload 

4)  Simulated  Transaction  Processing  Workload 

5)  Simulated  Calculation  Workload  (Math/Statistics/CAD/CAM) 

Group  2:  Tests  designed  to  perform  various  types  of  calculation  tasks  and  thereby 
profile  the  performance  of  the  computer’s  calculation  subsystem.  Group  2  includes  the 
following  tests: 


6)  Write  to  Shared  Memory 

7)  Read  from  Memory,  Sinall  Instruction  Area,  Small  Data  Area 

8)  Read  from  Memory,  Small  Instruction  Area,  Larger  Data  Area 

9)  Read  from  Memory,  Larger  Instruction  Area,  Small  Data  Area 

10)  Read  from  Memory,  Larger  Instruction  Area,  Larger  Data  Area 

11)  Make  Machine  Page  or  Swap  with  ’malloc’  and  *firee’ 

12)  Combined  Integer  and  Floating  Point  Math 

13)  Math  Library  Functions 

14)  Senuqjhores,  Shared  Memory,  Context  Switch 

15)  Write  to  and  Read  from  Pipes,  Context  Switch 

16)  Sample  System  Calls 

17)  Increasing  Depdi  of  Function  CaUs 


Group  3:  Tests  drat  perform  a  series  of  disk  input  and  ou^ut  functions  to  profile  the 
performance  of  the  disk  subsysteirt  Group  3  includes  the  following  tests: 


18)  1024  byte  Sequential  Reads  from  Unix  Hle(s) 

19)  1024  byte  Sequential  Writes  from  Unix  Hle(s) 
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20)  8192  byte  Sequential  Reads  from  Unix  Files(s) 

21)  3192  byte  Sequential  Writes  to  Unix  File(s) 

22)  4096  byte  Synchronized  Reads  from  Unix  Ble(s) 

23)  4096  byte  Synchronized  Reads  from  Raw  Device(s) 

24)  16384  byte  Synchronized  Reads  from  Unix  File(s) 

25)  16384  byte  Synchronized  Reads  from  Raw  Device(s) 

26)  4096  byte  Pseudo  Random  Reads  from  Unix  File(s) 

27)  4096  byte  Pseudo  Random  Reads  from  Raw  Device(s) 

28)  Profile  Disk  Cache  for  Unix  Hle(s) 

29)  Profile  Disk  Cache  for  Raw  Device(s) 

30)  8 1 92  byte  Sequential  Writes  then  ‘sync’ 

During  each  of  the  above  tests,  measures  will  be  obtained  at  load  factors  from  1  to  20. 
This  load  factor  number  indicates  the  number  of  copies  of  tlw  benchmark  program  which 
were  running  simultaneously.  Each  load  factor  unit  might  ^yproximate  the  workload  of  one 
or  two  heavy  users  or  possibly  twenty  light  users.  The  measurements  will  be  in  seconds  to 
complete  the  measured  task.  The  system  which  takes  less  time  to  accomplish  the  measured 
task  is  the  faster  system. 

C.  NEW  TEST  TRANSMISSION  CONTROL  PROTOCOL 

New  Test  TCP  (nttep)  uses  Test  TCP  (ttep)  as  the  basic  tool  for  determining  measured 
throughput  over  any  physical  network  media,  nttq?  provides  the  option  of  dynamically 
changing  the  TCP/IP  window  size  during  die  dnoughput  test  ttq>  was  developed  by  the  U. 
S.  Army’s  Ballistic  Research  Lab  (BRL)  wtuch  is  now  the  U.  S.  Army’s  Research  Lab 
( ARL)  and  is  considered  one  of  the  default  network  performance  benchmarks. 

nttep  tests  TCP  and  UDP  performance  by  timing  the  transmission  and  reception  of 
data  between  two  systems  using  the  UDP  or  TCP  protocols.  It  differs  from  common  “blast” 
tests,  which  tend  to  measure  the  remote  inetd  as  much  as  the  network  performance,  and 
which  usually  do  not  allow  measurements  at  the  remote  end  of  a  UDP  transmission. 

For  testing,  the  transmitter  should  be  started  with  -t  after  the  receiver  has  been  started 
with  -r.  For  testing  various  window  sizes,  nttep  allows  a  -w  option  which  permits  the  user 
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to  specify  the  desired  TCP/IP  window  size.  Some  of  the  other  options  which  were  used 
during  this  investigation  are  shown  below: 


't  Transmit  mode. 

-r  Receive  modef. 

-u  Use  UDP  instead  of  TCP.  * 

-n  Number  of  source  buffers  transmitted. 

-1  Length  of  buffers  in  bytes. 

-w  TCP/IP  window  size  in  k  bytes. 

-p  Port  number  to  send  to  or  listen  on. 

Below  are  the  commands  used  in  a  typical  session  during  this  investigation: 

Receiving  system  (gold): 

gold;  mtcp  -r  -p3(XX)  -wl2 

Transmitting  system  (white): 

white:  nttq>  -t  -p3000  -165536  -nl024  -wl2  gold 

The  shell  scripts  along  with  the  nttcp  fnogram  are  in  Appendix  A.  The  shell  scripts 
doitsh  and  mstjh  were  written  personnel  at  die  U.  S.  Army  Research  Lab  (ARL)  and 
modified  to  fit  this  investigation.  These  scripts  were  designed  to  be  used  with  the  program 
nttqp.  The  first  script,  doitjsh,  provides  the  various  combinations  of  data  sizes  to  be 
transferred  altmg  with  starting  and  stopping  times  of  each  tun.  This  script  runs  through  six 
iterations  of  identical  data  sets.  The  shell  script  ttestjhy  provides  the  calls  to  the  program 
nttcp.  Using  the  data  length  and  number  of  packets  ^lecified  in  the  shell  script  doitjh, 
ttestsh  makes  numerous  calls  to  nttqt  varying  the  window  size  from  4kto60kin8k 
increments.  This  combination  of  amount  of  data  transferred,  number  of  test  runs  and 
number  of  window  sizes  provides  a  total  of 576  measured  data  uansfers  during  a  single  run. 
Amount  of  data  transferred  (12  sizes)  *  number  of  test  runs  (6  runs)  *  number  of  window 


32 


sizes  (8  different  window  sizes)  =  576  measured  data  transfers.  Below  is  an  example  of  the 
results  from  a  single  call  to  nttcp  with  the  amount  of  data  to  be  transferred  equal  to 
33.554,432  bytes  of  data  and  the  TCP/IP  window  size  being  varied  from  4kto60kin8k 
increments: 


Window  Size(bytes) 

40% 

12288 

20480 

28672 

36864 

45056 

53248 

61440 


Transfer  Rate  (Mb/s) 

.  32.7680 

29.1271 
37.4491 

43.6907 
52.4288 

43.6907 

43.6907 
37.4491 


The  TCP/IP  window  size  is  adjusted  during  these  runs  using  the  setsockopt  system 
call.  After  the  window  size  has  been  adjusted,  the  getsockopt  system  call  is  performed  to 
verify  that  the  TCP/IP  window  size  has  been  changed  as  requested.  Figure  14  shows  an 
example  of  the  setsockopt  and  getsockopt  system  calls  used  in  the  nttcp  program. 


if  (setsockopt  (fd,  SOL.SOCKET,  SO_SNDBUF.  (char  *)  &sendwin,  sizeof(seiidwin))  <  U ) 
printfrget  send  window  size  didn’t  wofkVn”); 
if  (setsockopt  (fd.  SOL.SOCKET,  SO.RCVBUF,  (char  *)  &icvwin,  azeof(icvwin))  <  0 ) 
printfrget  icv  window  size  dictai’t  wofk*«"): 

if  (getsockopt  (fd.  SOL_SOCKET.  SO_RCVBUF.  (char  *)  Asendwin.  Aoptlen)  <  0 ) 
printfrget  send  window  size  didn’t  worl^n”); 
else  prinif(’'send  window  size  «  sendwin); 

if  (getsockopt  (fd.  SOL_SOCKET,  SO_RCVBUF.  (char  *)  &n;vwin,  &op^)  <  0 ) 
printf(’’get  icv  window  size  didn't  workNn’O: 
else  printfCYeceive  window  size  «  %<Ni’’,  icvwin); 


Figure  14:  Example  of  setsockopt  and  getsockopt  System  Calls 
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D.  REMOTE  COPY  PRtITOCOL  TRANSFER 

Another  program  being  used  to  measure  the  data  transfer  rate  is  a  simple  C  program 
which  issues  a  rep  command  transferring  a  file  from  one  workstation  to  another  (Appendix 
B).  The  primary  reason  for  choosing  the  rep  command  is  that  it  uses  TCP  which  is  a  reliable 
transfer  agent  versus  UDP  which  is  unreliable.  By  using  the  rep  command,  we  are  able  to 
measure  the  time  from  the  rep  command  being  issued  to  the  time  the  aek  is  received  back 
from  the  other  workstation.  The  system*  can  access  the  clock  prior  to  issuing  the  rep 
command,  and  then  again  after  it  receives  the  ack  from  the  other  workstation.  Since  the  rep 
provides  for  reliable  data  transfer,  this  allows  a  measurement  of  the  total  transfer  time. 
Figure  15  shows  the  code  obtaining  the  current  system  time,  issuing  the  rep  command  and 
then  obtaining  the  system  time  again  after  the  transfer  is  conqrlete. 

a  =  gettimeofdayC&timestart,  zonestart); 
if(a!=0) 

printfrOops!%dVn'',a); 

/*  Use  system  call  to  do  file  transfer  */ 
system  cicp  large.file  gold>fddi:/usr/test/igtow_test’'); 

/*  Get  stop  time  in  sec&usec  and  check  if  successful  */ 

b  =  gettimeofday(&tiinedone,  zonedone): 
if(b!=0) 

printf  ("Oops!  %dSn",  b); 

Figure  IS:  Implemen^on  of  RCP  System  Call 

This  method  includes  all  die  overhead  from  the  operating  system,  rep,  TCP,  IP  and 
FDDl.  After  the  rep  command  is  issued,  the  file  is  located  in  the  file  system  and  loaded  into 
memory.  Next,  the  workstation  from  which  the  command  is  being  executed  must  perform 
a  name/address  resolution  to  determine  where  the  file  is  being  transferred.  DNS  provides 
this  name/address  resolution.  Once  this  name/address  resolution  is  performed  the  file  is 
handed  off  to  TCP  to  begin  the  transfer  from  WOTkstadon  A  to  workstation  B.  T(ZP  hands 
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the  file  transfer  off  to  IP  which  forwards  the  file  to  the  FDDI  protocol.  At  this  point  the 
FDDI  SBus  card  transfers  the  file  from  workstation  A  to  workstation  B.  At  workstation  B 
the  reverse  scenario  takes  place.  The  file  is  handed  off  from  the  FDDI  protocol  to  the  IP 
protocol,  to  the  TCP  protocol,  and  finally  reaches  the  OS  on  workstation  B.  At  this  point, 
TCP  on  workstation  B  must  issue  an  ack  to  let  workstation  A  know  that  the  file  has  been 

correctly  received  and  handed  off  to  the  OS. 

« 

The  rep  command  copies  files  between  machines.  Each  filename  or  directory 
argument  is  either  a  remote  file  name  of  the  femm: 

hosmame:path 

or  a  local  file  name  (containing  no:  characters,  or  a  /  before  any:  characters). 

If  a  filename  is  not  a  full  path  name,  it  is  interpreted  relative  to  the  users  home 
directory  on  hostname.  A  path  on  a  remote  host  may  be  quoted  (using  \ ",  or  ')  so  that  the 
metacharacters  are  interpreted  remotely. 

rep  does  not  prompt  for  passwords;  your  current  local  user  name  must  exist  on 
hostname  and  allow  remote  command  execution  by  rsh. 

rep  handles  third  party  copies,  where  neither  source  nor  target  files  are  on  the  current 
machine.  Hostnames  may  also  take  the  form 

usemame@  hostname:filename 

To  use  username  rather  than  your  current  local  user  name  as  the  user  name  on  the 
remote  host  rep  also  supports  Internet  domain  addressing  of  the  remote  host  so  that: 

usemame@hostdonuun:filename 


35 


specifies  the  username  to  be  used,  the  hostname,  and  the  domain  in  which  that  host 
resides.  Filenames  that  are  not  full  path  names  will  be  interpreted  relative  to  the  home 
directory  of  the  user  named  username,  on  the  remote  host 

E.  PARAMETERS  WHICH  AFFECT  BOTH  TEST 

The  following  driver  parameters  will  be  tuned  under  Solaris  2.3. 

sbf_numjlc_rx  /*  For  LLC  network  traffic: 

/*  Number  of  4k  receive  buyers,  maximum  is  64  4k  buffers 
/*  Default  is  48  4k  buffers  per  NP-SB  adapter 

nfsjasyncjhreads  f*  Number  of  NFS  thread  for  handling  network  file  service 
/*  Default  is  8 

sbfjreq  t*  Amount  of  time  for  TTRT,  default  is  8ms 

/*  Range  is  from  2ms  to  16Sms 

sbfjntu  /*  Maximum  protocol  packet  size,  default  is  4352  bytes 

The  above  4  tunable  parameters  along  with  the  TCP/IP  window  size  will  be  varied 
during  the  rqt  and  nttcp  transfer  test  The  TCP/IP  window  size  controls  the  amount  of  data 
permitted  to  be  transferred  between  TCP  acknowlegments.  Numerous  tests  will  be  run 
varing  each  of  the  four  parameters  to  detennine  what  combination  of  values  provides  the 
optimum  throughput  performance  and  what  weight  each  parameter  has  on  the  changes.  The 
baseline  test  will  be  the  values  die  manufacture  recommends  as  die  default  values. 

F.  FILE  SIZES  FOR  BOTH  TRANSFERS 

In  order  to  measure  the  impact  of  the  TO*,  IP  and  FDDI  overhead  during  the  test, 
various  sizes  of  files  wiU  be  transferred.  For  the  rep  test,  the  properties  of  the  four  files  to 
be  used  are  shown  in  TABLE  1.  These  files  range  in  size  from  6  bytes  to  17,989,936  bytes. 
The  amount  of  overhead  during  the  transfers  can  be  estimated  as  follows: 

For  the  n/rep  test,  the  amounts  of  data  to  be  transferred  is  shown  is  TABLE  2.  The 
amounts  of  data  to  be  transferred  is  obbdited  by  specifying  the  Imgth  of  a  buffer  to  be 
transferred  and  the  number  of  buffers.  As  an  example,  if  2048  buffers  of  length  8192  bytes 
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are  transferred,  then  a  total  of  16,777,216  bytes  of  data  are  being  transferred.  The 
combinations  listed  in  TABLE  2  give  a  range  from  4,194,31)4  bytes  to  2.684354e+08  bytes 
being  transferred. 

TABLE  1:  RCP  HLE  SIZES  AND  ASSOCIATED  OVERHEAD 


1  FifeSte 

Total  OecriMSMi 

Huge 

(17.989.936  bytes) 

137* 

Large 

(U14.923  bytm) 

137*  , 

Medium 

(48.072  bytes) 

1.42* 

niiy 

(6  bytes) 

90.9* 

In  order  to  make  it  easier  to  reference  which  file  size  has  been  used  in  the  various  test, 
the  files  wiU  be  referred  to  as  File  A  through  File  H  with  File  A  being  the  smallest  file, 
4,194,304  bytes,  and  File  H  being  the  largest  61e,  268,435,400  bytes.  The  rest  of  the  files 
are  in  order  of  size  from  the  smallest  file  to  the  largest  file. 


TABLE  2:  HLES  (DATA  SIZES)  FOR  NTTCP  TEST 


Sl9Zlqr(t4 

(FUesA-D) 

65536  bytes 
(FBesE-H) 

_ 

4194304  bytes 

1 

1  67108864  bytes  || 

1  16777216  bytes  | 

■■EiaHI 

G.  SYSTEM  CONnCURATIONS  FOR  ALL  TESTS 

As  described  in  the  previous  sections,  various  tunable  parameters  and  file  sizes  will  be 
used  during  this  investigation.  In  order  to  obtain  reliable  results,  numerous  test  must  be 
conducted  to  achieve  a  comfortable  confidence  level.  Unfortunately,  it  is  not  practicable  to 
perform  all  the  test  runs  necessary  to  test  all  combinations  possible  let  alone  run  enough 
iterations  of  each  test  to  obtain  the  desired  confidence  level  in  the  results. 

As  an  example,  just  running  the  various  combinations  of  tests  described  earlier  with 
the  nttcp  program,  there  were  576  measured  data  transfers  during  a  single  run.  One  such 
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test  took  a  combined  total  of  3  hours  and  15  minutes  to  run.  During  initial  runs  of  the  nmp 
program,  the  TCP/IP  window  size  was  varied  in  4  k  increments.  It  was  determined  that 
there  was  little  difference  between  the  individual  transfer  rates  of  4  k  window  sizes. 
Therefore,  follow-on  test  were  run  at  intervals  of  K  k  window  sizes.  This  change  reduced 
the  run  times  from  over  6  hours  to  just  over  3  hours  with  little  to  no  loss  of  usable  results. 

As  noted  earlier,  there  are  other  tunable  parameters  which  can  be  modified  by  using 
the  set  command  in  the  /etc/system  file.  Once  again,  it  is  not  possible  to  test  all  possible 
combinations  of  parameters.  As  an  example,  if  we  start  with  the  576  measured  data 
transfers  which  took  over  6  hours  with  a  4  k  TCP/IP  window  size  increment,  then  test  the 
TTRT  parameter  at  5  ms  increments  (33  tests),  then  the  sbfjmmJlcjx  buffers  at  4  k 
increments  (IS  test),  then  the  sbf_num_smt_ix  buffers  at  4  k  increments  (IS  tests)  and 
assume  that  we  would  like  a  conffdence  level  which  requires  SO  runs  of  each  test,  we  would 
have  a  total  of  33*1S*1S*S0  =  3712S0  tests  needed  to  reach  any  conclusions.  If  each  test 
took  over  six  hours  to  conduct,  it  would  take  a  total  of  2,227,500  hours  or  92,812.5  days 
just  to  finish  conducting  the  tests. 

In  his  book  [JAIN91],  Jain  discusses  diis  dilemma  of  having  too  many  variables 
to  consider.  The  solution  is  to  first  get  a  gross  picture  of  the  impact  of  changing  selective 
parameters.  Once  a  parameter’s  impact  on  performance  has  been  determined,  then  more 
thorough  testing  can  be  conducted  by  adjusting  the  correct  parameters  to  obtain  the  desired 
confidence  level.  An  example  of  this  method  in  practice  is  changing  from  4  k  intervals  in 
the  TCP/IP  window  size  to  8  k  windows  sizes. 

In  addition  to  the  tunable  parameters  already  discussed,  this  investigation  is  looking 
into  the  iiTq)act  of  the  workstations  running  in  multiprocessor  modes  and  using  a  recently 
developed  operating  system,  Solaris  2.3.  This  now  doubles  the  required  testing!  First,  tests 
will  be  conducted  in  the  two  processor  configuration.  Then,  each  Sun  SPARCstation  will 
be  tested  with  only  a  single  process,  but  still  running  Solaris.  Once  again,  it  is  not  possible 
to  test  all  possible  tunable  parameters  especially  in  both  hardware  configurations.  Once  a 
pattern  has  been  established  in  the  single  processor  configuration,  follow-on  tests  in  the 
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multi-processor  hardware  configuration  will  be  focused  to  limit  the  scope  of  tests  to 
changing  those  parameters  which  produce  the  best  results. 

H.  PARAMETER  BASELINE 

First,  a  baseline  condition  must  be  established  before  any  changes  are  made  to  the 
system.  This  baseline  will  be  with  the  following  parameter  values  shown  in  TABLE  3.  This 
table  pertains  more  to  the  parameter  settings  in  the  nttq>  and  rep  test  than  the  Neal  Nelson 
Benchmark  test  The  first  parameter,  NFS_asynchjhreads,  has  an  impact  on  all  three  test 
The  other  three  parameters  only  impact  the  results  of  the  nttcp  and  rep  test  No  changes  will 
be  made  to  the  workstations  other  than  the  changes  to  the  tunable  parameters  listed  below. 
Stored  with  the  results  of  each  nttep  and  rep  test  run  is  a  README  file  with  the  below 
parameters  and  their  values  for  that  test 

While  the  below  parameters  are  changed  for  the  nttep  and  rep  test  the  TCP/IP  window 
size  will  also  be  varied.  The  TCP/IP  window  size  is  not  listed  below  in  TABLE  3  as  a 
tunable  parameter.  It  is  being  treated  differently  due  to  the  method  it  is  varied  during  the 
test  transfers.  The  nttep  program  will  be  varying  the  TCP/IP  window  size  during  the  test 
whereas  the  below  listed  tunable  parameters  must  be  changed  by  rebooting  the 
workstations  in-between  the  various  tests. 


TABLE  3:  DEFAULT  PARAMETERS  USED  FOR  ALL  THREE  TEST 


t_m| 

|1 

HiSI 

NeaiNebon 

Benchmark 

- 3 - 

8nu 

- m — 

- 3S5 - 

imcp 

8 

8nu 

48K 

«52 

RCP 

8 

8ms 

48K 

4352 

Below  is  a  review  of  the  parameter  descriptions: 

shf_numjle_rx  /♦  For  LLC  network  traffic.  Number  of  4k  receive  buffers 
/*  maximum  is  64  4k  buffers 

shfjntu  /*  Maximum  protocol  packet  size,  default  is  4352  bjrtes 

r_req  /*  Token  holding  time,  default  is  8ms 

rtfsjasynehjhreads  /*  For  NFS  service.  Number  of  threads  alloted.  Default  is  8 
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TTie  results  of  the  initial  nncp  baseline  test  during  the  single  processor  test  are  shown 
below  in  TABLE  4.  The  results  shown  in  this  table  are  the  averaged  results  obtained  from 
running  this  test  for  six  runs.  The  first  column  shows  the  TCP/IP  window  size  used  during 
the  test.  The  next  8  columns  which  are  labeled  File  A  through  File  H.  show'  the  averaged 
measured  throughput  in  Mbps  achieved  during  this  test  run. 


TABLE  4:  TEST  RESULTS  IN  SINtiLE  PROCESSOR  MODE 


Mbps 

Mbps 

32.77 

38Jt3 

118J3 

29.13 

32.77 

43.69 

32.77 

49.15 

3177 

43.69 

3177 

49.15 

74.96 

49.15 

3177 

43.69 

I  ■  ■  a  pxa  ■  : 1 


Mbps 

Mbps 

Mbps 

32.46 

31.92 

31J1 

24.63 

25.42 

24.93 

40J7 

40J3 

40.62 

40.57 

40J9 

41.67 

41.61 

40J9 

41.67 

40J7 

39.43 

4126 

38.75 

37.93 

39  J5 

3172 

34J7 

30.09 

V.  TEST  RESULTS  AND  ANALYSIS 


In  this  chapter,  the  results  from  the  three  tests  discussed  in  Chapter  IV  will  be 
presented.  First,  the  results  from  the  Neal  Nelson  Benchmark  tests  will  be  presented.  These 
results  will  show  that  the  newer,  faster  50MHz  processors  should  outperform  the  older 
40MHz  processors.  Next,  the  results  from  the  New  Test  TCP  (nttcp)  network  throughput 
tests  will  be  presented.  These  results  will  show  under  what  conditions  the  highest 
throughput  can  be  achieved  and  what  throughput  bottlenecks  exists.  Last,  the  results  from 
the  rep  transfer  tests  will  be  presented.  These  results  will  help  to  identify  bottlenecks  within 
the  workstation  as  a  whole.  The  nttcp  tests  directly  access  the  TCP/IP  layer  and  &  'lot 
provide  a  true  measure  of  all  the  overhead  present  in  distributed  processing. 

A.  NEAL  NELSON  BENCHMARK 

The  Neal  Nelson  Benchmark  is  the  tool  being  used  to  measure  the  capabilities  of  the 
workstations  and  the  operating  systems  being  tested.  It  is  important  to  verify  that  the 
hardware  we  believe  will  perform  faster  has  been  verified  to  perform  faster. 

To  begin  with,  two  system  disks  were  configured  with  the  Solaris  2.3  operating  system 
and  one  system  disk  was  configured  with  the  SunOS  4.1.3  operating  system.  A  three 
gigabyte  disk  was  partitioned  and  half  of  it  made  into  a  Unix  file  system,  leaving  the  other 
half  as  a  raw  disk  partition.  The  source  code  for  the  benchmark  was  obtained,  installed,  and 
compiled  under  Solaris  2.3  and  SunOS  4.1.3  with  the  default  tuning  parameters. 

The  benchmark  was  started  in  the  background  and  took  ^proximately  20  hours  to  run 
under  each  of  the  following  four  hardware  configurations:  Gold  with  two  SOHMz 
processors  and  White  with  two  40MHz  processors,  each  running  Solaris  2.3;  Gold  with  one 
SOMHz  processor  running  Solaris  2.3;  Gold  with  one  50MHz  processor  running  SunOS 
4.1.3.  Solaris  2.3  is  Sun  Microsystem’s  new  operating  system  based  on  AT&T  System  V 
Unix  while  SunOS  4.1.3  is  based  on  Berkley’s  unix. 
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Once  the  benchmark  testing  was  completed,  the  results  were  collected  and 
electronically  mailed  to  Neal  Nelson  &  Associates,  where  the  test  reports  were  generated. 
The  results  from  the  three  different  configurations  discussed  below  are  listed  in  Appendix 
C  with  approval  from  Neal  Nelson  &  Associates. 

1.  (iuld  Versus  White,  Two  Processors  and  Solaris  23 

In  group  1  tests,  which  are  intended  to  approximate  the  processing  activities  of 
five  types  of  users.  Gold  consistently  perfonned  the  tasks  approximately  20  percent  faster 
than  White. 


Figure  16:  Gold  Versus  White,  Two  Processors 


In  group  2  tests,  which  are  designed  to  perform  various  types  of  calculation  tasks 
and  thereby  profrle  the  performance  of  the  coiiq>uter's  calculation  subsystem.  Gold 
continued  to  perform  the  tasks  t^tproximately  20  percent  faster  than  White. 
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In  group  3  tests,  which  performed  a  series  of  disk  input  and  output  functions  to 
profile  the  performance  of  the  disk  subsystem,  the  results  were  mixed,  but  Gold  still 
outperformed  White  on  the  average.  These  results  varied  from  Gold  outperforming  White 
an  average  of  20  percent,  to  times  when  White  outperformed  Gold. 

In  Figure  16  on  page  42  are  the  graphical  results  of  Test  1,  Simulated  Office 
Automation  Workload.  Gold,  with  two  SOMHz  processors  running  Solaris  2.3,  clearly  took 
less  time  to  perform  the  test  than  White  with  two  40MHz  processors  running  Solaris  2.3 
except  at  a  load  of  1 1.  Once  again,  a  load  can  signify  either  several  light  users  or  a  single 
heavy  user.  As  the  loads  increase  you  have  either  mote  light  users  or  multiple  heavy  users. 


0  2  4  6  8  10  12  14  16  18  20 

LOAD 


Figure  17:  Gold  One  Processor  Versus  Gold  Two  Processors 


2.  Gold  One  Processor  Versus  Gold  Two  Processors  and  Solaris  23 

In  group  1  tests,  the  two  processor  configuration  consistently  outperformed  the 
single  processor  configuration  by  80  to  90  percent. 
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In  group  2  tests,  the  two  processor  configuration  continued  to  outperform  the 
single  processor  configuration  by  8U  to  90  percent  in  all  areas  but  one.  In  test  14. 
Semaphores.  Shared  Memory  and  Context  Switch,  the  two  processor  configuration  only 
outperformed  the  single  processor  configuration  by  5  to  7  percent 

In  group  3  tests,  the  results  were  once  again  mixed.  The  two  processor 
configuration  outperformed  the  single  processor  configuration  in  all  tests  but  three  by  SO 
percent  In  test  19,  1024  byte  Sequential  Writes  from  Unix  File(s)  and  test  21.  3192  byte 
Sequential  Writes  to  Unix  File(s),  the  single  processor  ouqjerformed  the  two  processor 
configuration  by  an  average  of  over  200  popcent  In  test  30,  8192  byte  Sequential  Writes 
then  ‘sync’,  the  single  processor  configuration  ouqieifonned  the  two  processor 
configuration  by  approximately  20  percent 

In  Figure  17  on  page  43  are  the  graphical  results  of  Test  1,  Simulated  Office 
Automation  Workload.  Gold  with  one  SOMHz  processor  running  Solaris  2.3  clearly  took 
more  time  to  perform  the  test  than  Gold  with  two  SOMHz  processors  running  Solaris  2.3. 

3.  Gold  With  One  Processor,  Solaris  23  Versus  SunOS  4.'i3 

In  group  1  tests,  the  results  were  once  again  varied.  SunOS  4.1.3  ouqperformed 
Solaris  2.3  in  4  of  the  5  tests  at  the  higher  load  levels  by  3  to  4  percent  Solaris  2.3 
ouq)erformed  SunOS  4.1.3  in  two  of  the  test  at  the  lighter  load  levels  by  3  to  4  percent 

In  group  2  test  the  results  were  more  consistently  in  favor  of  SunOS  4.1.3.  In  7 
of  the  12  test  SunOS  4.1.3  outperformed  Solaris  2.3  by  4  to  5  percent  In  te^  13,  Math 
Library  Functions,  SunOS  4.1.3  outperformed  Solaris  2.3  by  an  average  of  40  percent 
Solaris  2.3  only  ouq>erformed  SunOS  4.1.3  in  three  of  the  test  areas.  Two  of  the  areas  the 
percent  was  once  again,  only  by  2  to  3  percent  In  test  17,  Increasing  Depth  of  Function 
Calls,  Solaris  2.3  ouq^erformed  SunOS  4.13  by  an  average  of  40  to  50  parent 

In  group  3  tests,  the  results  were  once  again  varied.  In  6  of  the  tests,  SunOS  4.1.3 
ouq)eiformed  Solaris  2.3  by  anywhere  from  15  to  over  500  percent  In  seven  of  the  tests, 
Solaris  2.3  ouQ>erfonned  SunOS  by  anywhere  from  100  to  over  400  percent  Once  again 
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though,  it  appears  that  SunOS  4.1.3  came  out  slightly  ahead  in  the  high  load  area  over 
Solaris  2.3 

Below  in  Figure  18  are  the  graphical  results  of  Test  1,  Simulated  Office 
Automation  Workload.  Gold  with  one  SOMHz  processor  running  SunOS  4. 1 .3  slightly  beat 
out  Gold  with  one  SOMHz  processor  running  Solaris  2.3  at  the  higher  loads. 


B.  NEW  TEST  TRANSMISSION  CONTROL  PROTOCOL 

As  discussed  in  Chapter  IV.  the  file  sizes  used  during  the  test  runs  with  New  Test  TCP 
(nttep)  are  shown  below  in  TABLE  5.  The  files  are  created  by  specifying  the  length  of  the 
buffer  to  be  created  and  the  number  of  buffers  to  be  sent  The  files  will  be  referred  to  as  File 
A  through  File  H  widi  File  A  being  the  smallest  file,  4,194,304  bytes,  and  File  H  being  the 
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largest  file,  26X,435,40()  bytes.  The  rest  of  the  files  are  in  order  of  size  from  the  smallest 
file  to  the  largest  file. 


TABLE  5:  HLES  (DATA  SIZES)  FOR  NTTCP  TEST 


IcngUi  of  Buffers- 

Nunber  of  Buffers 

8192  bytes 
(FUesA-D) 

lllllllllllllll^g 

5ii 

FILE  B  8388608  bytes  | 

After  conducting  several  test  runs  and  observing  the  results,  it  became  obvious  that 
some  smaller  file  sizes  were  not  large  enough  to  obtain  accurate  results.  Whenever  data  is 
transferred  using  the  nttcp  program,  the  actual  CPU  time  is  the  time  used  for  calculating  the 
throughput  If  the  CPU  time  used  is  too  small,  less  than  0.1  seconds,  the  results  become 
unreliable.  An  example  of  an  unreliable  transfer  rate  is  given  below  in  Figure  19.  The 
reason  for  the  inaccurate  throughput  result  is  the  small  amount  of  CPU  time  taken  during 
this  data  transfer. 

Transfers  using  the  number  of  buffers  s  512  and  the  length  of  buffer  s  8192  were  the 
only  ones  which  had  the  unreliable  transfer  rates.  There  w^e  typically  only  one  or  two 
transfer  rates  in  each  test  which  were  unreliable.  However,  the  window  size  was  not  always 
the  same  at  which  the  unreliable  transfer  rate  occurred.  Therefore,  the  results  of  File  A 
transfers  were  not  used  in  this  analysis. 


send  window  size  =  12288 
receive  window  size  »  12288 

ttcp-r  4194304  bytes  in  0.06  real  seconds  s  68266.67  KB/sec  =  546.1333  Mb/s 


Figure  19:  NTTCP  Ouq>ut  for  File  Size  of  4194304  Bytes 
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1.  Single  Processor  Results 

The  first  32  test  were  run  whUe  Gold  and  White  were  set  up  in  a  single-processor 
configuration  running  Solaris  2.3.  These  32  test  represent  a  small  subset  of  all  possible 
tunable  parameter  combinations.  The  primary  focus  of  this  first  set  of  test  was  to  determine 
the  effect  of  modifying  the  TCP/IP  window  size,  the  rtfs_async_threads  and  the  t_req 
parameters.  Additionally,  tests  were  conducted  transferring  data  from  White  to  Gold,  Gold 
to  White  and  both  ways  simultaneously.  The  32  tests  and  the  values  of  the  tunable 
parameters  are  listed  in  TABLE  36,  Appendix  D. 

The  data  gathered  in  the  above  32  tests  was  analyzed  using  multiple  linear 
regression  analysis  according  to  the  model  y  »  P,  +  P,^i  ♦  + ...  +  p^x^  +  e  which  relates  the 

behavior  of  a  dependent  variable  y  to  a  linear  function  of  the  set  of  independent  variables 
X},  X2, ...  Xg,.  The  p/s  are  the  parameters  that  specify  the  nature  of  the  relationship,  and  e  is 

the  random  error  term.  The  dependent  variable  y  in  this  model  is  throughput  Refer  to 
Hgure  20  on  page  49  under  the  bold  face  number  12  for  the  list  of  p/s  used  in  this  model. 

The  tool  used  to  produce  the  multiple  linear  regression  analysis  is  Statistical 
Analysis  System  (SAS).  The  S  AS  tool  is  used  to  assist  data  analysts  in  analyzing  data  using 
regression  analysis.  Below  in  Figure  20  is  an  analysis  of  data  throughput  between  White 
and  Gold  in  the  single  processor  configuration  using  the  results  from  tests  1  -  32.  Below  is 
a  description  of  the  output  fr’om  SAS  as  explained  in  [SASI91].  The  bold  face  numbers 
have  been  added  to  aid  in  a  description  of  the  ouq>ut 

1.  The  name  of  the  dependent  variable  is  THRUPUT. 

2.  The  degrees  of  freedom  (DF)  associated  with  the  sums  of  squares  (SS). 

3.  The  Regression  SS  (called  Model  SS)  is  61279.61308,  and  the  Residual  SS 
(called  ERROR  SS)  is  65217.01718.  The  sum  of  these  two  sums  of  squares  isthe  CTOTAL 
(corrected  total)  SS  =  126496.63026.  This  illustrates  the  basic  identity  in  regression 
analysis  that  TOTAL  SS  =  MODEL  SS  +  ERROR  SS.  Usually,  a  good  model  results  in  the 
MODEL  SS  being  a  large  fraction  of  the  C  TOTAL  SS. 
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4.  The  corresponding  Mean  Squares  are  the  Sum  of  Squares  divided  by  the 

respective  DF.  The  MS  for  ERROR  (MSE)  is  an  unbiased  estimate  of  o’ .  provided  the 
model  is  correctly  specified. 

5.  The  value  of  the  F  statistic,  239.470,  is  the  ratio  of  the  MODEL  Mean  Square 
divided  by  the  ERROR  Mean  Square.  It  is  used  to  test  the  hypothesis  that  all  coefficients 
in  the  model,  except  the  intercept,  are  0.  In  this  case,  this  hypothesis  is: 

Ho:  p,-  p2-  p3-  Pj 

6.  The  p  value  (Prob>F)  of 0.0001  indicates  that  some  of  the  p,  are  not  equal  to  0. 

7.  Root  MSE  =  6.04621  is  the  square  root  of  the  ERROR  MS  and  estimates  the 
error  standard  deviation. 

8.  Dep  Mean  =  30.21891  is  simply  the  average  of  the  values  of  the  variable 
THRUPUT  over  all  observations  in  die  data  set 

9.  C.  V.  =  20.00803  is  the  coefficient  of  variation  expressed  as  a  percentage.  This 
measure  of  relative  variation  is  the  ratio  of  Root  MSE  to  Dep  Mean,  multiplied  by  100. 

10.  R'SQUARE  s  0.4844  shows  that  a  large  portion  of  the  variation  in 
THRUPUT  can  be  explained  by  variation  in  the  independent  variables  in  the  model. 

1 1.  ADJ  R-SQ  is  an  alternative  R-SQUARE  and  is  an  alternative  to  R-SQUARE 
that  is  adjusted  for  the  number  of  parameters  in  die  model  according  to  the  formula 

ADJ  R-SQ  *  1  -  (1  -  R-SQUARE)((n  -  l)/(n  -  w  - 1)) 

where  n  is  the  number  of  observations  in  the  data  set  and  m  is  the  number  of 
regression  parameters  in  the  model,  excluding  the  intercept  This  adjustment  is  used  to 
overcome  an  objection  to  R-SQUARE  as  a  measure  of  goodness  of  fit  of  the  model.  This 
objection  stons  from  the  fact  that  R-SQUARE  can  be  drivra  to  1  singly  by  adding 
superfluous  variables  to  the  model  with  no  real  improvement  in  fit  This  is  not  the  case  with 
ADJ  R-SQ,  which  tends  to  stabilize  to  a  certain  value  when  an  adequate  set  of  variables  is 
include  in  the  model. 
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Mode:  SINGLE  PP  DCESSOR  MODEL 
Dependent  Variable:  THRUPUT 

1 

Analysis  of  Variance 

3  4 


2 

Sum  of 

Mean 

5 

6 

Source 

DF 

Squares 

Square 

F  Value 

Prob>F 

Model 

7 

61279.61308 

8754.23044 

239.470 

0.0001 

Error 

1784 

65217.01718 

36.55662 

C  Total 

1791 

126496.63026 

7  Root  MSE 

6.0462 

10  R-square 

0.4844 

8  Dep  Mean 

30.21891 

11  AdJR-sq 

0.4824 

9  C.V. 

20.00803 

Parameter  Estimates 

12 

13 

14 

15 

16 

Parameter 

Standard 

T  for  HO: 

Variable 

DF 

Estimate 

Error 

ParametersO 

Prob>m 

INTERCEP 

1 

27.673306 

0.68625789 

40.325 

0.0001 

SINGLE 

1 

8.620893 

0.28565645 

30.179 

0.0001 

WHITRAN 

1 

5.140603 

0.28565645 

17.996 

0.0001 

NUMBUFF 

1 

-0.000246 

0.00010718 

-2.295 

0.0219 

LENBUFF 

1 

-0.000107 

0.00000511 

-20.927 

0.0001 

WINDSIZE 

1 

0.008507 

0.00779192 

1.092 

0.2751 

TTRT 

1 

0.016060 

0.01864409 

0.861 

0.3891 

THREADS 

1 

0.008069 

0.03570706 

0.226 

0.8212 

Figure  20:  S  AS  Analysis  of  Single  Processor  Transfers 


12.  The  labels  INTERCEP,  SINGLE,  WHTTRAN,  NUMBUFF,  LENBUFF, 
WINDSIZE,  TTRT  and  THREADS  identify  the  coefficient  estimates.  The  parameter 
SINGLE  is  used  to  show  if  the  transfers  were  just  between  one  workstation  at  a  time,  or  if 
both  White  and  Gold  were  transmitting  at  die  same  time.  The  parameter  WHITRAN  is  used 
to  show  if  White  is  transmitting  or  if  Gold  is  transmitting:  The  other  parameters  were 
previously  describ  .i  Chapter  IV,  Test  Design  Plan. 
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1 3.  The  Parameter  Estimates  give  the  fitted  model 

THRUPUT  =  27.673306  +  K.620«93(SINGLE)  +  5.I40603(WHITRAN) 

-  0.000246(NUMBUFF)  -  0.000 107(LENBUFF) 

+  0.00«507(WINDSIZE)  +  0.016060(TTRT)  +  0.00H069(THREADS) 

Thus,  for  example,  a  window  size  of  Ik  contributes  0.00h507  to  the  throughput  of 
data  if  all  other  parameters  are  held  fixed.  If  the  window  size  is  4Sk,  then  it  contributes 
0.382815  if  all  other  parameters  are  held  fixed. 

14.  These  are  the  (estimated)  standard  errors  of  the  parameter  estimates  and  are 
useful  for  constructing  confidence  intervals  for  the  parameters. 

15.  The  t  tests  (T  for  HO:  Parameter  =  0)  are  used  for  testing  hypotheses  about 
individual  parameters.  The  complete  model  for  all  of  these  t  tests  contains  all  the  variables 
on  the  right  side  of  the  MODEL  statement  Hie  reduced  model  for  a  particular  test  contains 
all  these  variables  except  the  one  being  tested.  Thus,  the  t  statistic = 0.008507(WINDS1ZE) 
for  testing  the  hypothesis  Ho:  fu  o  is  actually  testing  whether  the  complete  model 
containing  NUMBUFF,  LENBHFF,  WINDSIZE,  TTRT  and  THREADS  fits  better  than 
the  reduced  model  containing  only  NUMBUFF,  LENBUFF,  TTRT  and  THREADS. 

16.  The  p  value  (Prob  >  ITI)  for  this  test  is  p  =  0.0001 . 

As  shown  in  Figure  20  under  item  16,  Prob<m,  the  parameters  NUMBUFF, 
WINDSIZE,  TTRT  and  THREADS  had  the  least  impact  on  THRUPUT  in  this  model.  This 
shows  up  as  the  higher  the  Prob<m  of  the  independent  variable,  the  less  impact  it  has  on 
the  dependent  variable  being  modeled.  Included  in  this  model  was  the  system  transforing 
the  data  (WHTTRAN)  and  whether  it  was  a  one  way  transfer  or  two  way  transfer  (SINGLE). 
Therefore,  the  tunable  parameters  are  competing  with  the  fact  that  a  40MHz  workstation  is 
being  compared  to  a  50MHz  workstation  and  whether  or  not  another  station  is  competing 
for  the  token  to  transfer  data. 

The  end  result  in  this  model  is  that  the  independent  variable  SINGLE  has  the  most 
impact  on  THRUPUT  and  WHITRAN  has  the  next  largest  impact  on  THRUPUT.  This 
shows  that  competition  for  the  token  has  more  impact  on  throughput  than  tuning  the 
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system.  However,  there  is  still  a  performance  gain  to  be  realized  with  tuning  the  system  for 
better  throughput.  In  Figure  21  is  a  graphic  comparison  of  the  1st  Test  with  the  29th  Test 
As  a  reminder,  the  1st  Test  is  using  the  default  parameters  and  the  29th  Test  is  using  the 
following  parameter  settings:  t_req  =  25ms;  rrfsjtsync  threads  =  16;  sbfjiumjlc_rx  =  48. 


Figure  21 :  Single  Processor,  Rle  D  Transfer  From  White  to  Gold 


2.  Two  Processor  Results 

The  second  set  of  test  were  run  while  Gold  and  White  were  set  up  in  a  two- 
processor  configuration  running  Solaris  2.3.  These  48  tests  represent  a  small  subset  of  all 
possible  tunable  parameter  combinations.  The  primary  focus  of  this  set  of  test  was  to 
determine  the  effect  of  modifying  the  TCP/IP  window  size,  the  rtfs  asyncjhreads.,  t_req, 
sbf_num_llc_rx  and  the  sbfjnm  parameters.TTie  48  test  and  the  values  of  the  tunable 
parameters  are  listed  in  TABLE  71,  Appendix  E. 
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The  primary  difference  between  this  set  of  tests  and  the  single  processor  test  is 
that  all  transfers  were  made  from  White  to  Gold.  To  have  also  included  transfers  from  Gold 
to  White  in  this  set  of  test  would  have  doubled  the  number  of  transfers  to  96  tests. 
Originally  it  was  thought  that  by  incTeasing  the  number  of  parameters  being  observed  the 
R-square  value  would  also  have  increased.  The  intention  here  was  to  account  for  more  of 
the  factors  which  impact  the  dependent  variable  THRUPUT. 


Mode:TWO  PROCESSOR  MODEL 


Dependent 

Variable:  THRUPUT 

Analysis  of  Variance 

Sum  of 

Mean 

Source 

DF 

Squares 

Square 

F  Value 

Prob>F 

Model 

7 

66901.88212 

9557.41 173 

68.151 

0.0001 

Error 

2680 

375842.31356 

140.23967 

C  Total 

2687 

442744.19568 

Root  MSE 

11.84228 

R-square 

0.1511 

Dep 

Mean 

40.72729 

Adj  R-sq 

0.1489 

C.V. 

29.07702 

Parameter  Estimates 

Parameter 

Standard 

T  for  HO: 

Variable 

DF 

■  Estimate 

Error 

ParametersO 

Prob>m 

DMTERCEP 

1 

-91.980251 

12.3567%55 

-7.444 

0.0001 

NUMBUFF 

1 

-0.000068737 

0.00017141 

-0.401 

0.6884 

LENBUFF 

1 

-0.000062619 

0.00000817 

-7.664 

0.0001 

WINDSIZE 

1 

-0.019754 

0.01246095 

-1.585 

0.1130 

TTRT 

1 

-0.024980 

0.02981591 

-0.838 

0.4022 

THREADS 

1 

-0.034226 

0.05710325 

-0.599 

0.5490 

LLC 

1 

0.643378 

0.03496846 

18.399 

0.0001 

MTU 

1 

0.024786 

0.00285516 

8.681 

0.0001 

Figure  22:  SAS  Analysis  of  Two  Processor  Transfers 
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As  shown  in  Figure  22  on  page  52,  the  R-square  value  decreased  considerably 
between  the  single  processor  test  and  the  dual  processor  test.  As  it  will  be  shown  later  on. 
the  cause  for  this  decrease  was  the  removal  of  the  largest  impact  on  throughput,  competing 
with  other  stations  for  the  token.  Another  indicator  of  the  lack  of  confidence  in  the  data 
being  modeled  is  the  large  Standard  Error  for  die  independent  variable  INTERCEP.  In  the 
single  processor  model  INTERCEP  had  a  value  of  0.68625789.  In  the  dual  processor 
model,  the  error  has  increased  to  12.35679655. 

The  independent  variables,  NUMBUFF,  THREADS  and  TTRT  continued  to  have 
the  least  amount  of  impact  on  the  dependent  variable  THRUPUT  as  indicated  by  their  low 
Prob>m  values.  The  independent  variables  with  the  largest  impact  were  LENBUFF,  LLC 
and  MTU. 

3.  One  And  Two  Processor  Results 

In  the  final  analysis  of  both  one  a»d  two  processor  tests,  some  additional  facts 
need  to  be  presented.  There  were  a  total  of  4,480  diroughputs  measured  in  this  analysis. 
There  were  896  measurements  in  the  one  processor  configuration  and  2688  measurements 
in  the  two  processor  configuration.  These  are  averaged  measurements  taken  from  the  six 
runs  in  each  32  48  =  80  tests.  Also,  there  were  896  measurements  where  both  Gold  and 
White  were  transmitting  at  the  same  time  and  2688  measurements  where  only  one  station 
was  transmitting. 

When  the  model  was  first  run  including  all  the  data  from  die  one  and  two 
processor  tests  the  R-square  value  was  only  0.3559.  This  was  higher  than  in  the  two 
processor  model  but  lower  than  in  the  one  processor  model.  A  scatter  plot  was  made  of  the 
various  parameters  to  determine  where  there  might  be  some  problems  with  individual 
parameters.  The  most  obvious  problem  was  seen  with  the  large  variation  of  throughput  with 
the  parameter  window  size.  At  both  the  high  end  and  the  low  end,  the  plot  of  window  size 
versus  throughput  was  not  linear.  By  restricting  the  analysis  of  data  to  window  sizes  less 
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than  50k  and  greater  than  16k  the  R-square  value  increased  to  0.66(X).  This  reduced  the 
number  of  measured  observations  from  4.4K()  throughputs  to  2,240  measured  throughputs. 


Mode.  ONE  &  TWO  PROCESSOR  MODEL 

Dependent  Variable:  THRUPUT 

Analysis  of  Variance 

Sum  of 

‘  Mean 

Source  DF 

Squares 

Square 

F  Value 

Prob>F 

Model  10 

179959.58511 

17995.95851 

432.681 

0.0001 

Error  2229 

92708.03657 

41.59176 

C  Total  2239 

272667.62168 

Root  MSE 

6.44917 

R-square 

0.6600 

Dep  Mean 

42.53933 

Adj  R-sq 

0.6585 

C.V. 

15.16048 

Parameter  Estimates 

Parameter 

Standard 

T  for  HO: 

Variable  DF  Estimate 

Error 

ParameteiM) 

Prob>m 

INTERCEP  1 

-70.427345 

9.87019489 

-7.135 

0.0001 

SINGLE  1 

9.928996 

0.43090313 

23.042 

0.0001 

WHTRAN  1 

3.652165 

0.43090313 

8.476 

0.0001 

NUMBUFF  1 

-0.000052070  0.00010226 

-0.509 

0.6107 

LENBUFF  1 

-0.000047372  0.00000487 

-9.719 

0.0001 

WINDSIZE  1 

-0.200113 

0.01523473 

-13.135 

0.0001 

TTRT  1 

-0.012831 

0.01778717 

-0.721 

0.4708 

THREADS  1 

-0.039099 

0.03406588 

-1.148 

0.2512 

LLC  1 

0.583336 

0.02693145 

21.660 

0.0001 

MTU  1 

0.015782 

0.00219894 

7.177 

0.0001 

SD  1 

9.535964 

0.44849820 

21.262 

0.0001 

Figure  23:  S  AS  Analysis  of  Single  and  Two  Processor  Transfers 


The  results  of  the  one  and  two  processor  analysis  are  above  in  Bgure  23.  One  new 
independent  variable,  SD  is  used  to  model  whether  the  transfer  comes  from  the  one 
processor  tests  or  the  two  processor  tests.  Just  as  before,  the  independent  variables 
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NUMBUFF,  TTRT,  and  THREADS  have  the  least  amount  of  impact  on  THRUPUT.  With 
the  removal  of  the  window  sii.es  noted  above,  WINDS  I2E  now  carries  more  weight  in  this 
model.  The  largest  impact  on  THRUPUT  in  order  of  impact  is  caused  by  the  variables 
SINGLE,  SD,  LLC  and  WINDSIZE.  This  statement  will  be  covered  in  more  detail  later. 
This  indicates  once  again  that  processor  power  has  the  largest  impact  on  throughput.  A 
graphical  model  of  the  difference  is  below  in  Figure  24.  In  this  figure  are  plots  of 
throughput  from  identical  parameter  configurations,  but  one  is  from  a  two  processor  run 
and  the  other  is  from  a  one  processor  run. 


Another  useful  result  which  can  be  determined  from  the  analysis  of  the  one  and 
two  processor  tests  is  a  predicted  throughput  Below  in  Figure  25  are  S  AS  predictions  of 
THRUPUT  based  on  the  2,240  measured  throughputs  used  in  this  analysis.  To  achieve  the 
minimum  predicted  throughput  the  following  test  was  run  using  the  parameter  settings 


indicated  in  Figure  25.  Data  was  transferred  from  Gold  to  White  and  White  to  Gold 
simultaneously.  The  results  were  taken  from  Gold  with  NUMBUFF  =  4096.  LENBUFF  = 
65536.  WINDSIZE  =  44,  TmT  =  25.  THREADS  =  16.  LLC  =  40  and  MTU  =  4192.  The 
results  are  below  in  TABLE  6. 

The  SAS  predictions  for  the  minimum  predicted  throughput  was  for  a  rate  of 
15.5302  Mbps.  As  shown  in  TABLE  6  the  results  from  the  actual  tests  was  an  average  of 
15.1463  Mbps  and  an  mean  of  15.0454  Mbps.  Since  the  data  used  in  the  model  was 
averaged  data  instead  of  mean  data,  the  averaged  achieved  rate  is  the  more  accurate 
throughput  rate  to  use.  The  SAS  predictions  for  the  maximum  predicted  throughput  was  for 
a  rate  of  58.7810  Mbps.  As  shown  in  TABLE  6  the  results  from  the  actual  tests  was  an 
average  of  60.07  Mbps  and  an  mean  of  6S.S360  Mbps.  In  both  cases  the  average  throughput 
measured  was  very  close  to  the  predicted  throughput  This  shows  that  the  SAS  model  was 
very  accurate 
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Figure  25:  SAS  Throughput  Prediction 


TABLE  6:  RESULTS  OF  SAS  PREDICTIONS 
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The  following  formula  relates  the  behavior  of  the  dependent  variable  THRUPUT 
to  a  linear  function  of  the  set  of  independent  variables  SINGLE,  WHITkAN,  NUMBUFF, 
LENBUFF,  WINDSIZE,  TTRT,  THREADS,  LLC.  MTU  and  SD.  These  are  the  values 
calculated  in  the  One  and  Two  Processor  Model,  Figure  23  on  page  54. 

THRUPUT  =  -70.427345  +  9.928996(SINGLE)  +  3.652 165(WHITRAN) 

-  0.000052070(NUMSUFF)  -  0.000047372(LENBUFF) 

-  0.2001 13  (WINDSIZE)  -  0.0 12831 (TTRT)  -  0.039099(THREADS) 

+  0.583336(LLC)  +  0.015782(MTU)  +  9.535964(SD) 

When  the  minimum  and  maximum  throughput  was  predicted  above  in  Figure  25 
on  page  56,  it  was  simply  a  matter  of  inserting  the  largest  parameter  value  in  the  above 
formula  the  parameter  estimate  is  positive  and  the  smallest  parameter  value  if  the 
parameter  estimate  is  negative.  This  resulted  in  the  maximum  predicted  throughput  For  the 
minimum  predicted  throughput  the  largest  parameter  value  is  used  if  the  parameter 
estimate  is  negative  and  the  smallest  parameter  value  if  the  parameter  estimate  is  positive. 

Below  are  the  formulas  for  minimum  and  maximum  throughput  with  the 
parameter  estimates  and  parameter  values  multiplied  together. 

Maximum  Throughput: 

58.7544  =  -70.427345  +  9.928996  +  3.652165  -  0.05331968  -  0.38807142  -  4.00226 

-  0.064155  -  0.312792  +  32.666816  -i-  68.683264  +  19.071928 

Minimum  Throughput 

15.5302  =  -70.42734  +  0  +  0  -  0.21327872  -  3.1045714  -  8.804972  -  0.320775 

-  0.625584  +  23.33344  +  66.158144  +  9.535964 

Once  the  minimum  and  maximum  throughputs  were  computed,  the  relative  value 
of  each  parameter  was  calculated  by  subtracting  the  parameter’s  minimum  value  from  it’s 
maximum  value.  Below  in  Figure  26  are  the  results  from  this  calculation.  The  value  from 
the  maximum  calculation  is  listed,  then  the  value  from  the  minimum  value  is  listed  and 
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finally  the  difference  is  listed.  It  is  this  difference  which  shows  the  impact  each  parameter 
has  on  the  end  throughput.  The  higher  the  difference  is,  the  more  weight  that  parameter 
carries  in  determining  the  maximum  throughput. 
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Figure  26:  Relative  Importance  of  Each  nttcp  Parameter 


The  results  listed  above  show  that  the  following  parameters,  in  order  of 
importance,  have  the  most  impact  on  throughput  using  the  current  model: 

•  If  the  data  was  only  being  transferred  from  one  workstation  to  another  or  if 
both  workstations  were  transferring  data  to  each  other  simultaneously. 

•  Whether  the  workstation  had  one  or  two  processors 

•  The  number  of  4K  receive  buffers  aUotted  for  receiving  data. 

•  The  number  of  TCP/IP  windows  available  for  sending  data. 


Since  the  TCP/IP  window  size  was  limited  in  the  above  model  to  arange  of  20k  to  44k, 
this  par  iimeter  showed  up  having  less  of  an  impact  than  it  really  has.  As  an  example,  in 
TABLE  72  on  page  120  of  Appendix  E,  the  throughput  rate  for  File  C  is  32.77  Mbps  for  a 
window  size  of  4k  tuid  58.25  Mbps  for  a  window  size  of  44k.  That  means  the  throughput 
rate  at  a  4k  window  size  is  only  56  percent  the  rate  of  the  44k  window  size.  In  this  case,  the 
window  size  has  the  largest  impact  on  throughput  performance.  Unfortunately  though,  the 
results  at  the  lower  and  higher  window  sizes  were  not  consistent  in  all  cases  and  the  data 
was  removed  from  the  analysis.  In  most  cases  though,  die  difference  in  du-oughput 
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performance  between  a  TCP/IP  window  size  of  4k  and  a  window  size  of  greater  than  20k 
is  more  significant  than  any  other  factor  considered  in  this  investigation. 

Based  on  the  visual  inspection  of  the  results  from  both  the  one  processor  tests  and 
the  two  processor  tests,  below  is  a  revised  list  in  order  of  importance  the  parameters  having 
the  most  impact  on  throughput: 

•  The  number  of  TCP/IP  windows  available  for  sending  data. 

•  If  the  data  was  only  being  transferred  from  one  workstation  to  another  or  if 
both  workstations  were  transferring  data  to  each  other  simultaneously. 

•  Whether  the  workstation  had  one  or  two  processors 

•  The  number  of  4K  receive  buffers  allotted  for  receiving  data. 

Another  parameter  which  showed  unexpected  results  is  the  WHTTRAN 
parameter.  This  parameter  is  used  to  track  any  differences  in  throughput  between 
transmitting  data  from  White  to  Gold,  or  from  Gold  to  White.  The  result  in  Figure  25  on 
page  56  indicates  that  transmitting  data  from  White  to  Gold  was  faster  than  transmitting 
data  from  Gold  to  White.  In  the  fust  32  one  processor  tests.  White  had  one  40MHz 
processor  and  Gold  had  one  50MHz  processor.  In  the  second  48  tests.  White  had  two 
40MHz  processors  and  Gold  had  two  50MHz  processors.  Based  on  the  Neal  Nelson 
Benchmark  tests.  Gold  should  be  capable  of  transferring  data  faster  than  White. 

Several  additional  tests  were  conducted  to  determine  why  White  was  able  to 
uansmit  data  at  a  higher  throughput  than  Gold.  Hrst,  the  FDDI  cards  were  swapped  to  see 
if  the  FDDI  card  in  Gold  was  causing  the  problem.  The  results  of  these  tests  are  in  TABLE 
69  on  page  1 17  and  TABLE  70  on  page  1 18.  There  was  not  any  noticeable  difference  in 
throughput  rates  with  the  boards  swapped.  Next,  the  two  50MHz  processors  were  placed  in 
White  and  the  two  40MHz  processors  were  placed  in  Gold.  The  results  of  these  tests  are  in 
TABLE  121  and  TABLE  122  on  page  137.  As  shown  in  Figure  27,  even  when  both 
uan.smitting  systems  had  two  50MHz  processors  and  both  receiving  systems  had  40MHz 
processors.  White  sdll  had  a  higher  throughput  rate  with  File  C  than  Gold. 
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Figure  27:  Throughput  G>mparison  Between  White  and  Gold 

The  only  other  difference  between  White  and  Gold  is  that  Gold  is  the  server  on 
the  FDDI  network.  Since  the  FDDI  netwOTk  only  had  three  workstations  on  the  netwoik, 
this  additional  load  on  Gold  should  not  be  that  great 

C.  REMOTE  COPY  PROTOCOL  TRANSFERS 

Initially,  the  plan  was  to  conduct  file  transfers  using  the  rep  system  call  varying  the 
tunable  parameters  just  as  in  the  nttep  ve&ts.  However,  it  was  quickly  observed  that  there 
were  not  any  noticeable  diffemices  in  measured  throughput  at  the  different  parameter 
settings.  This  was  understandable  with  the  parameters  rtfsjisyncjhreads  and  t_req.  The 
S AS  model  showed  that  these  tunable  parameto's  had  little  effect  on  throughput  However, 
it  was  expected  that  there  would  be  some  different  throughput  rates  with  the  TCP/IP 
window  size,  Uc  and  mtu  parameters  varied. 
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The  reason  why  the  these  parameters  did  not  have  an  impact  was  that  rep  does  small 
size  readO’s  and  writeO’s,  so  the  syscall  overhead  dominates  over  the  time  spent  in  the 
kernel  in  TCP.  If  an  application  wants  optimum  bulk  data  throughput,  it  should  increase  the 
receive  buffering,  and  also  do  moderately  large  readO’s  and  writeO’s  so  that  the  syscall 
overhead  does  not  dominate.  Also,  rep  has  to  go  through  a  complete  login,  exec  of  the 
user’s  shell,  and  run  through  the  user’s  “.eshre”  or  “.profile”  on  the  server  side  before  it 
begins  transferring  any  data.  If  the  data  transfer  is  not  really  huge,  the  time  spent  logging 
in  will  be  much  greater  than  the  time  spent  transferring  the  data. 

Knowing  that  the  largest  impact  on  throughput  based  on  the  S  AS  modeled  data  is  TCP/ 
IP  window  size,  processor  power  and  whether  or  not  another  station  is  also  transmitting, 
four  difierent  transfer  tests  were  conducted  with  each  of  the  four  file  sizes.  As  shown  below 
in  TABLE  7  and  TABLE  8  on  page  62,  ttsts  were  conducted  in  the  one  processor 
configuration  and  the  two  processor  configuration  while  transferring  files  one-way  and 
two-way  (between  White  and  Gold  simultaneously). 


TABLE  7:  RCP  ONE  PROCESSOR  TRANSFER  RESULTS 


UNV 

(6  bytes 

ICTj^ITWII 

iBfTTEJTnHlI 

ONE-WAY  TRANSFER 

While  to  Gold 

TO;  /FILE-NAME 

.000032  Mbps 

.25  Mbps 

4.91  Mbps 

1.3.20  Mbps 

rO;  /DEV/NULL 

.000032  Mbps 

.25  Mbps 

5.85  Mbps 

26.41  Mbps 

TW<  i  WAY  TRANSFERS 
White  to  Gold  &  Gold  to  White 

• 

H);  /FTLF  N  AME 

.000027  Mbps 

.23  Mbps 

447  Mbps 

11.49  Mbps 

TO:  /DEV/NULL 

.000027  Mbps 

.22  Mbps 

4.73  Mbps 

16.72  Mbps 

Also,  files  were  transferred  from  disk  to  disk  and  from  disk  to  /dev/null.  This  second 
tran.sfer  method  does  not  result  in  a  disk  write  at  the  destination  workstation.  The  device 
driver,  /dev/null,  is  used  to  dispose  of  files  without  needing  to  delete  them.  Files  can  be  sent 
to  /dev/null  and  this  device  driver  accepts  the  data  without  writing  them  to  disk. 
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The  largest  impact  seen  in  this  set  of  tests  was  the  file  size.  The  lowest  throughput  rate 
was  observed  when  transferring  the  smallest  file.  TINY.  This  file  has  an  associated 
overhead  of  90.9%  when  being  transferred  over  FDDl.  The  highest  throughput  was  seen 
with  the  file  HUGE.  This  file  only  had  an  overhead  of  1.37%  when  transferred  over  FDDI. 
These  overhead  figures  include  the  overhead  associated  with  the  FDDl.  IQP  and  TCP 
protocols.  Another  area  with  similar  results  as  the  nttep  test  is  whether  the  transfers  are  one¬ 
way  or  two-way.  When  the  two  workstations  have  to  confute  for  the  token  the  throughput 
drops. 


TABLE  8:  RCP  TWO  PROCESSOR  TRANSFER  RESULTS 


LARGE 

(1414.923  bytes) 

HUGE 

(17.989.936  bytes) 

||TO:  /FILE-NAME 

.000031  Mbps 

.25  Mbps 

4.94  Mbps 

1334  Mbps 

IfrO:  /DEV/NULL 

.000031  Mbps 

J25Mbps 

5.87  Mbps 

2842  Mbps 

wmmmKmmm 

IHHIIHI 

ONE-WAY  TRANSFER 

Gold  to  White 

TO: /FILE-NAME 

.000029  Mbps 

.24  Mbps 

21.66  Mbps 

TO:  /DEV/NULL 

.000029  Mbps 

.24  Mbps 

S.81  Mbps 

29.82  Mbps 

TWO-WAY  TRANSFERS 
White  to  Gold  A  Gold  to  While 

TO;  /FILE-NAME 

.000029  Mbps 

.24  Mbps 

4.64  Mbps 

1327  Mbps 

TO:  /DEV/NULL 

5.55  Mbps 

23.18  Mbps 

The  results  during  the  rep  tests  were  much  lower  than  during  die  nttep  tests.  As  an 
example,  on  the  transfer  of  a  file  size  of  ova*  17  Mbytes  from  Gold  with  two  processors  to 
White:/dev/null,  the  best  achieved  throughput  rate  was  29.82  Mbps  widi  rep.  This  is  only 
29.82  percent  of  FDDl’s  available  bandwiddi  and  only  43.7  percent  of  the  highest  achieved 
throughput  using  nttep  (65Mbps).  When  transferring  the  same  file  from  Gold  to  White  and 
writing  the  file  to  disk,  the  transfer  rate  was  21.66  Mbps.  This  rate  is  only  72  percent  of  the 
transfer  rate  of  transferring  the  data  to  /dev/nuU.  Below  in  Figure  28  on  page  63  is  a 
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graphical  plot  of  the  transfer  rates  just  mentioned  while  transferring  the  17.9  Mbyte  file 
from  Gold  with  two  50MHz  processors  to  White  with  two  40  MHz  processors. 

There  were  two  main  differences  between  the  transfer  methods:  First,  the  rep  transfers 
add  another  layer  of  protocols  to  the  transfers.  The  rep  protocol  hands  off  the  data  to  be 
transferred  to  the  TCP/IP  protocol  layers.  This  of  course  increases  the  amount  of  overhead 
transferred.  Second,  using  rep  to  transfer  the  data  involves  reading  the  data  from  disk 
before  it  can  be  transferred.  Even  though  large  amounts  of  data  can  be  cached  in  the 
SuperCache  1 -Mbyte  external  cache,  this  is  not  large  enough  for  extremely  large  files  being 
transferred  to  be  completely  cached.  During  this  test  files  were  transferred  9  times  and  then 
the  median  throughput  rate  was  used  for  the  results. 


The  results  from  the  rep  tests  were  pretty  much  as  expected.  The  tv.o  processor 
transfers  were  faster  than  the  single  processor  transfers  and  the  one-way  transfers  were 
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faster  than  the  two-way  transfers.  However,  the  difference  in  these  throughput  rates  was  not 
as  large  as  that  seen  with  the  nttcp  tests.  Since  the  additional  overhead  from  the  rep  system 
call  should  affect  the  transfer  rates  evenly,  then  the  only  other  difference  is  that  the  data 
was  transferred  from  disk  instead  of  being  generated  by  the  CPU.  The  large  difference  in 
throughput  rates  achieved  between  the  two  test  methods  would  indicate  that  the  disk  access 

is  a  very  large  bottle  neck  in  throughput  performance. 

« 

A  quick  comparison  of  the  throughput  rate  observed  using  nttcp  for  a  file  size  of 
16,777,216  bytes  (File  C)  and  a  rep  transfer  of  a  file  size  17,989,936  bytes  shows  a 
throughput  rate  of  32.77  Mbps  for  the  nttcp  transfer  and  a  throughput  rate  of  28.42  Mbps 
when  transferred  to  /dev/null.  Both  of  these  tests  were  one-way  tests  from  White  to  Gold 
with  both  systems  in  the  two  processor  confrguradon.  In  this  comparison,  the  rep  tests  had 
a  throughput  rate  which  is  86.7  percent  of  the  nttcp  throughput  rate.  This  seems  to  indicate 
that  the  retrieval  of  the  file  fr’om  disk  and  the  overhead  of  the  rep  protocol  are  responsible 
for  13.7  percent  of  the  slow  down  in  throughput  when  transferring  files. 

When  comparing  the  transfer  rate  of  an  rep  transfer  from  White  to  a  file  location  on 
Gold  with  the  nttcp  throughput  rate,  there  is  a  much  larger  difference  in  throughput  The 
nttcp  throughput  rate  is  still  32.77  Mbps  and  the  throughput  rate  for  the  rep  file  to  file 
transfer  is  13.54  Mbps.  Here  the  rep  throughput  rate  is  41.3  percent  of  the  nttcp  throughput 
rate.  This  means  that  the  time  to  receive  and  process  the  file  at  the  destination  workstation 
accounts  for  45  percent  of  the  reduced  throughput  This  is  the  58.7  percent  reduction  minus 
the  13.7  percent  attributed  to  the  retrieval  of  the  file  fi’om  disk  and  the  overhead  of  the  rep 
protocol. 

D.  ANALYSIS  SUMMARY 

The  results  from  the  Neal  Nelson  Benchmark  showed  that  the  systems  being 
investigated  were  functioning  as  expected.  Hie  50MHz  system  ouqierfonned  the  40MHz 
system  and  the  two  processor  system  ouqierformed  the  one  processor  system.  One 
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unexpected  result  was  that  SunOS  4.1.3  slightly  outperformed  Solaris  2.3  in  just  about 
every  test  except  disk  access  to  unix  files.  Solaris  2.3  was  the  clear  winner  in  this  area. 

The  nttcp  results  were  analyzed  using  a  linear  multiple  regression  analysis  model. 
Even  though  the  throughput  results  were  not  linear,  the  model  is  believed  to  be  accurate 
enough  to  show  the  relationship  between  the  parameters  being  investigated.  The  analysis 
of  this  data  provides  the  most  concrete  results  of  the  two  throughput  tests  methods. 

The  number  of  workstations  on  an  FDDI  network  transmitting  has  the  largest  impact 
on  throughput  among  the  parameters  investigated  according  to  the  one  processor  and  two 
processor  models.  An  example  of  this  impact  is  to  take  the  SAS  prediction  shown  in  Figure 
25  on  page  56  and  change  the  parameter  SINGLE  from  its  one-way  value  to  the  two-way 
value.  This  allows  SAS  to  predict  a  new  throughput  rate  based  on  all  the  previous  values 
except  the  change  just  noted.  The  result  of  the  new  prediction  shows  a  new  throughput 
prediction  of  48.8254  Mbps.  This  is  only  83.1  percent  of  the  original  throughput 
predication  of  58.7544  Mbps. 

The  power  of  the  workstation  itself  is  a  major  factor  in  throughput  potential.  This  is 
seen  in  the  fact  that  the  second  largest  impact  on  throughput  in  the  one  processor  and  two 
processor  model  is  whether  or  not  die  workstation  had  two  processors.  The  result  of  the 
new  one  processor  prediction  shows  a  throughput  predication  of  49.2184  Mbps.  This  is 
83.7  percent  of  the  original  throughput  predication  of  58.7544  Mbps. 

Since  the  TCP/IP  window  size  was  limited  in  the  model  to  a  range  of  20k  to  44k,  this 
parameter  showed  up  having  less  of  an  impact  than  it  reaUy  has.  In  most  cases  though,  the 
difference  in  throughput  performance  between  a  TCP/IP  window  size  of  4k  and  a  window 
size  of  greater  than  20k  is  more  significant  than  any  other  factor  considered  in  this 
investigation. 

The  results  from  the  rep  tests  are  more  of  an  observation  of  the  effects  of  the  disk  drive 
on  throughput  performance.  Since  both  tests  measure  the  time  from  start  of  test  to  receiving 
the  ack  from  TCP  on  the  receiving  workstation  that  the  data  has  been  received,  the  only 
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other  real  differences  is  the  rep  protocol  and  the  fact  that  the  data  is  being  transferred  as 
files  instead  of  being  generated  by  the  processor. 

As  pointed  out  earlier,  the  overhead  of  the  rep  protocol  and  the  time  spent  retrieving 
the  file  from  disk  is  approximately  13.7  percent  of  the  throughput  rate  observed  during  the 
nttep  throughput  tests.  Additionally,  the  overhead  of  processing  the  file  at  the  receiving 
workstation  is  approximately  45  percent  of  the  throughput  rate  observed  during  the  nttep 
throughput  tests. 

The  observation  made  in  the  nttep  tests  that  white  with  only  40MHz  processors  could 
transfer  data  faster  than  Gold  with  50MHz  processors  was  not  seen  again  in  the  rep  tests. 
In  the  rep  tests.  Gold  was  able  to  transfer  data  at  a  higher  throughput  rate  than  White  when 
Gold  had  the  two  SOMHz  processors  and  White  had  the  two  40MHz  processors. 
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VI.  CONCLUSIONS  AND  TOPICS  FOR  FUTURE  RESEARCH 


A.  CONCLUSION 

The  objective  of  this  research  was  to  measure  actual  throughput  between  high 
performance  workstations  over  an  FDDI  network  to  determine  what  bottlenecks,  if  any, 
exits  between  Sun  Microsystem  SPARC  10  multiprocessors  running  the  Solaris  2.3  and 
Network  Peripheral  Inc.’s  (NPI)  FDDl  network  interface  cards  and  to  evaluate 
Transmission  Control  Protocol/lntemet  Protocol  (TCP/IP)  as  a  high  speed  transport 
protocol. 

At  the  beginning  of  this  investigation  thov  were  mwy  speculations  as  to  what 
throughput  rates  could  be  achieved  and  what  effect  varying  the  different  tunable  parameters 
would  have  on  the  throughput  rates.  It  was  assumed  that  the  workstation  with  the  SOMHz 
processor  would  have  a  faster  throughput  rate  than  the  workstation  with  the  40MHz 
processors.  It  was  also  assumed  that  since  Sun  Microsystems  was  encouraging  their  users 
to  switch  from  SunOS  to  Solaris,  that  Solaris  2.3  would  clearly  out  perform  SunOS  4.1.3. 

The  following  sections  outline  the  conclusions  drawn  from  these  investigations: 

1.  Workstation  Conclusions 

There  were  four  benclimsTk  tests  conducted  using  the  Neal  Nelson  Business 
Benchmark  run  on  the  two  workstations.  Gold  and  White. 

•  Gold  had  two  50MHz  processors  installed  and  was  running  Solaris  2.3. 

•  Gold  had  one  SOMHz  processor  installed  and  was  running  Solaris  2.3. 

•  Gold  had  one  SOMHz  processor  installed  and  was  running  SunOS  4.1.3. 

•  White  had  two  40MHz  processors  installed  and  was  running  Solaris  2.3. 

Three  test  comparisons  were  conducted  by  Neal  Nelson  and  Associates  and  the 
res'dts  can  be  summarized  as  follows: 

•  A  workstation  running  Solaris  2.3  with  two  SOMHz  processors  can  be  expected 
to  outperform  a  workstation  running  Solaris  2.3  with  two  4()MHz  processors 
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in  most  areas  of  performance  by  approximately  20  percent. 


•  A  workstation  running  Solaris  2.3  with  two  50MHz  processors  can  be  expected 
to  outperform  a  workstation  running  Solaris  2.3  with  one  50MHz  processor  in 
most  areas  of  performance  by  approximately  90  percent. 


•  A  workstation  running  SunOS  4.1.3  with  one  50MHz  processor  can  be 
expected  to  ouqwrform  a  workstation  running  Solaris  2.3  with  one  50MHz 
processor  in  most  areas  of  performance  by  approximately  2  percent. 


Of  the  three  comparisons  noted  above,  the  first  two  results  were  expected. 
However,  it  was  assumed  that  Sun  Microsystem’s  release  of  Solaris  2.3  would  result  in 
improved  operating  system  performance,  not  a  slight  drop  in  performance.  These  results 
were  very  important  in  the  next  step  of  the  investigation.  Knovtdng  that  the  workstation  with 
two  50MHz  processors  should  ou^terform  the  workstation  with  two  40MHz  processors 
helped  isolate  some  unexpected  results  in  workstation  throughput 

2.  Throughput  Conclusions 

There  were  two  methods  used  in  this  investigation  to  measure  throughput  First 
a  public  domain  network  throughput  measiuement  tool.  New  Test  TCP  (nttep),  was  used 
in  order  to  minimize  the  workstation  overhead.  Next  the  Remote  Copy  Protocol  (rep) 
system  call  was  used  in  order  to  include  all  the  overhead  of  daily  distributed  processing. 
The  results  obtained  fixim  these  two  test  methods  were  consistent  with  each  other. 

New  Test  TCP  (nttep):  During  the  nttep  tests  the  following  tunable  parameters 
were  varied  to  determine  their  impact  on  throughput  performance: 

•  TCP/IP  window  size,  the  amount  of  data  that  can  be  in  transient  at  any  one  time 
between  workstations. 

•  sbf_numjle_rx,  number  of  receive  buffers  (4k  each)  on  the  FDDI  board 
allotted  for  receiving  data. 

•  rtfs_async_threads,  number  of  asynchronous  threads  allotted  for  handling 
network  file  system  service. 

•  slrfjreq,  amount  of  time  allotted  for  each  workstation  to  transfer  data  prior  to 
passing  on  the  token.  This  is  the  TTRT. 

•  sbfjntu,  maximum  protocol  packet  size. 
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Additionally,  the  nttcp  tests  were  run  on  both  single  processor  configurations  and 
on  two  processor  configurations.  During  this  investigation  the  nttcp  tests  results  showed 
that  the  four  most  significant  impacts  on  throughput  and  the  order  of  impact  were  as 
follows: 

•  Whether  data  was  being  transferred  one-way  or  if  both  workstations  were 
transferring  data  simultaneously. 

•  Whether  the  workstation  had  .one  or  two  processors 

•  The  number  of  4K  receive  buffers  allotted  for  receiving  data. 

•  The  size  of  TCP/IP  window  available  for  sending  data. 

One  note  about  the  TCP/IP  window  size.  During  this  investigation  TCP/IP 
window  sizes  less  than  20k  and  greater  than  44k  had  too  large  of  a  deviation  in  their 
throughput  results  to  be  included  in  the  final  analysis.  When  the  all  of  the  TCP/IP  window 
sizes  are  included,  this  parameter  ends  up  having  the  largest  impact  on  throughput  rates. 
The  rest  of  the  results  retain  the  above  order  of  impact  on  throughput 

The  other  tunable  parameters  varied  during  these  tests  had  little  impact  on 
throughput  performance.  Below  are  the  test  of  the  factors  affecting  throughput  in  their 
order  of  importance: 

•  The  length  of  the  buffers  being  transmitted.  1  his  equates  to  the  size  of  the  data 
being  transmitted. 

•  The  Maximum  Transmission  Unit  (MTU)  size.  This  is  the  size  of  the  FDDl 
frames  of  data  being  transmitted. 

•  The  number  of  NFS  asynchronous  threads  allowed  for  servicing  network  file 
service. 

•  The  number  of  buffers  (file  size)  being  transmitted. 


Remote  Copy  Protocol  (rep):  During  the  rep  tests  the  tunable  parameters  were 
varied,  but  there  was  no  noticeable  difference  in  these  throughput  rates.  The  TCP/IP 
window  size,  which  had  the  largest  impact  in  the  nttcp  tests,  did  not  have  any  noticable 
impact  on  throughput  The  reason  why  the  TCP/IP  window  did  not  have  an  impact  was  that 
rep  does  small  size  readO’s  and  writeO’s,  so  the  all  overhead  dominates  over  the  time 
spent  in  the  kernel  in  TCP.  If  an  application  want>  optimum  bulk  data  throughput,  it  should 
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increase  the  recieve  buffering,  and  also  do  moderately  large  readO’s  and  writeO’s  so  that 
the  syscall  overhead  does  not  dominate. 

The  only  difference  between  the  nttcp  tests  and  the  rep  tests  was  the  additional 
overhead  with  the  rep  disk  transfers  and  the  rep  protocol  overhead.  Therefore,  the 
conclusion  can  be  drawn  that  one  of  these  two  differences  accounted  for  the  very  large  drop 
in  throughput  between  the  nttep  tests  and  the  rep  tests. 

On  the  transfer  of  a  file  size  of  over  17  Mbytes  firom  White  with  two  processors 
to  Gold,  the  best  achieved  throughput  rate  was  13.54  Mbps  with  rep  when  the  transferred 
data  is  written  to  disk.  This  is  only  13.54  percent  of  FDDl’s  available  bandwidth  and  only 
41.3  percent  of  the  highest  achieved  throughput  using  nttep  at  the  same  TCP/IP  window 
size  of  8k.  Most  of  this  41.3  percent  difference  between  rep  and  nttep  can  be  attributed  to 
the  rep  protocol  overhead.  RCP  has  to  go  through  a  complete  login,  exec  of  the  user’s  shell, 
and  run  through  the  user’s  '‘.eshre”  or  “.profile”  on  the  server  side  before  it  begins 
transfering  any  data.  If  the  data  transfer  is  not  really  huge,  the  time  spent  logging  in  will 
be  much  greater  than  the  tune  spent  transfering  the  data 

B.  TOPICS  FOR  FUTURE  RESEARCH 

Several  topics  for  further  study  can  be  derived  firom  this  investigation.  All  of  them  are 
related  to  either  improving  throughput  or  to  explaining  events  which  were  not  explained  in 
this  thesis. 

Since  the  nttep  tests  were  only  able  to  obtain  a  maximum  throughput  using  TCP 
transfers  of  65  Mbps,  35  percent  of  the  available  bandwidth  of  FDDI  is  not  being  used. 
What  portion  of  this  unused  bandwidth  is  due  to  lack  of  processor  power  and  what  portion 
is  due  to  inefficiencies  in  the  TCP/IP  protocol? 

This  investigation  primarily  looked  at  throughput  rates  associated  with  TCP  transfers, 
not  User  Datagram  Protocol  (UDP)  transfers.  The  UDP  frames  have  a  header  of  8  bytes  and 
the  TCP  firames  have  a  header  of  20  bytes.  Also,  UDP  is  not  a  reliable  transport  protocol. 
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How  much  of  a  throughput  can  be  achieved  using  UDP  and  what  problems  occur  when 
using  an  unreliable  transfer  protocol? 

File  transfers  using  the  rep  system  call  displayed  a  throughput  rate  of  only  1 3.54  Mbps 
when  the  transferred  data  is  written  to  disk.  What  percentage  of  this  bottleneck  is  caused 
by  the  throughput  rate  on  the  SCSI-2  controller  and  what  percentage  is  caused  by  other 
overhead  associated  with  file  transfers? 
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APPENDIX  A:  NTTCP  PROGRAM  and  TEST  SCRIPTS 


DUrr^H  Script 


#!A»in/sh 
date  >  start 

date  >  ninl_start_tinie 

ttest.sh  65536  512 
ttest^h  8192  512 
ttest.sh  65536  1024 
ttest.sh  8192  1024 
ttest.sh  65536  2048 
ttestjh  8192  2048 
ttest^h  65536  4096 
ttest^h  819240% 

date  >  runl_fiiiish_tifne 

mkdirrunl 
mv  *.lo{;  *.out  ninl/. 
mv  *tiine  runl/. 

date  >  nin2_start_tifne 

ttest^h  65536  512 
ttestJh819251^ 
ttest^h  65536  1024 
ttest^h  8192  1024 
ttest^h  65536  2048 
nest.sh  8192  2048 
ttest^h  65536  40% 
ttestjh819240% 

date  >  nin2_nnish_liiiie 

inkdirrun2 
mv  *.log  *.out  iun2/. 
mv  *time  run2/. 

date  >  nin3_stait_time 

ttest^  65536  512 
ttest^h  8192  512 
ttest^h  65536  1024 
nest^  8192  1024 


BesLsh  65536  2048 
QesLsh  8192  2048 
ItesLsh  65536  4096 
BesLsh  8192  4096 

date  >  run3_fiiiish_time 
mkdirnin3 
mv  •.log  ‘.out  nin3/. 
mv  *time  ran3/. 

date  >  niii4_surt_time 

BesLsh  65536  512 
BesLsh  8192  512 
BesLsh  65536  1024 
BesLsh  8192  1024 
BesLsh  65536  2048 
BesLsh  8192  2048 
BesLsh  65536  4096 
BesLsh  8192  4096 

date  >  nin4_fuush_time 

mkdiriun4 
mv  ‘.log  •.out  niii4/. 
mv  •time  iun4/. 

date  >  nin5_start_tiine 

BesLsh  65536  512 
BcsLsh  8192  512 
BcsLsh  65536  1024 
aesLsh8192  1024 
BesLsh  65536  2048 
BesLsh  8192  2048 
BesLsh  65536  4096 
BesLsh819240% 

date  >  nm5_fiiush_tijne 

mkdirnin5 
mv  •.log  •.out  nin5/. 
mv  •time  niii5/. 
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date  >  nin6_stan_time 

sleep  5 

giep  'Mb/s'  tmpl  1  awk 

Itest.sh  65536  512 

•SSIZE'*1024.S12r  » ttestout 

ttestsh  8192  512 

cat  tmpl » ttestjecv.log 

ttest.sh  65536  1024 

SIZEs'expr  SSIZE  +  8' 

ttest.sh  8192  1024 
tte$t.sh  65536  2048 

done 

ttest.sh  8192  2048 

rm  -f  tmpl 

ttest.sh  65536  4096 

mv  ttest.out  ttestSDATALEN.SNPKTS.out 

ttestsh  8192  4096 

• 

mv  ttesttran.log 

QestSDATALEN.SNPKTS.tran.l(^ 

date  >  run6_finish_tiine 

mv  ttesttecv.log 

mkdir  nin6 
mv  ♦.log  ^.out  rtin6/. 
mv  •time  run6/. 

date  >  finish 

ttest.SDATALEN.SNPKTSjecv.log 

TTEST.SH  Script 


#!/bin/sh 

# 

#  Use  nttcp  to  test  network  throughput 

#  Usage:  ttestsh  byte_per_write 

number.of  writes 

# 

DATALEN=S1 

NPKTS=$2 

#White  to  Gold 
RECHOST=131.120.1.2 
RSHs/usr/ucb/rsh 
NTTCP=nttcp 

rm  -f  ttestout 
mn  -f  ttest.tran.log 
rm  -f  ttestjecv.log 

#  from  4KB  to  60KB  windows  in  steps  of  8KB 
SIZEM 

while  test  $S1Z£  -It  61 
do 

$RSH  SRECHOST  SNTTCP  -r  -w$SIZE 
>  tmpl  2>&1  & 

sleep  5 

SNTTCP  -t  -ISDATALEN  -nSNPKTS  -wSSIZE 
SRECHOST  » ttestiran.log  2>&1 


'(print 
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NTTCP  Program 


r 

*  NTTCP. C 

* 

*  Test  TCP  connection.  Makes  a  connection  on  pon  2U(X) 

*  and  transfers  zero  buffers  or  dam  copied  from  stdin. 

* 

*  Usable  on  42. 4.3,  and  4.1a  systems  by  defining  one  of 

*  BSD42  BSD43  (BSD41a) 

*  « 

Modified  for  operation  under  42BSD.  18  Dec  84 

*  T.C.  Slattery.  USNA 

*  Minor  improvements.  Mike  Muuss  and  Terry  Slattery.  16-Oct-8.S. 

* 

*  ModiSed  on  S  Apr  94  for  opertion  under  Solaris  23  based  on  changes 

*  for  the  TTCP.C  program  provided  by  Don  Merritt  of  ARL. 

*  CFT  Mark  Schivley.  USA 
V 

#ifndef  lint 

static  char  RCSidQ  « '‘@(#)$Header.  ^rc/optAiri/sbinAtcp/RCS/ttcpx.v  12 1993/1 1/30  20: 15:39 

root  Exp  S  (BRL)"; 

iendif 

«defineBSD43 

/•#defineBSD42*/ 

/*#defineBSD41a*/ 

#include  <stdio.h> 

#include  <ctype.h> 
finclude  <enmoJi> 

#include  <sys/typesJi> 

#include  <qrs/socket.h> 

#include  <netinet/in  Ji> 

#include  <netdb  Jo 

#incliKle  <sys/timeJo  f*  stnict  tinieval  */ 

#ifdefSYSV 
iinclude  <sy$/times.h> 

#inciude  <sys^)aramJo 
#else 

#incliide  <sys/resourceJo 

#endif 

#ifdefSYSV 

#define  bct^yfs/U)  memqiyfd,  s,  (size.t)  1) 

#define  bzero(sJ)  memseKs,  0.  (size_t)  1) 

#endif 

struct  sockaddr_in  sinme; 
struct  sockaddr_in  sinhim; 
struct  sockaddr.in  sindum; 
struct  sockaddr.in  Crominet; 
im  domain,  finomlen; 
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int  fd: 

/*  fd  of  network  socket  */ 

int  sendwin  «  32  *  1024; 
int  rcvwin  «  32  •  1024; 
int  opden  =  sizeof(int); 
int  buflen  s  1024; 

/*  lengdi  of  buffer  */ 

char  *huf: 

/*  ptr  to  dynamic  buffer  •/ 

int  nbuf  =  1024; 

/*  number  of  buffers  to  send  in  sinkmode  */ 

int  udp  s  0; 

/•0  =  tcp,!0»udp*/ 

int  options  »  0; 

f*  socket  options  */ 

intones  1; 

r  for  43  BSD  style  setsockoptO  *! 

short  pm  s  2001; 

/*  TCP  port  number  */ 

char^host; 

/*  per  to  name  of  host  */ 

inttrans; 

r  Osreceive.  iOstransmit  mode  V 

int  sinkmode  s  1; 

/*  Osnormal  I/O.  lOssink/souice  mode  *1 

int  verbose  =  0; 
int  nodelay  >  0; 

f*  set  TCP_NODELAY  socket  option  */ 

int  window  s  0; 

/*  Osuse  default  Isset  to  specified  size*/ 

struct  hostent  *addr. 
extern  int  ermo; 
char  UsageQ  > 

Usage:  ttcp  -t  [-options]  host  <in\n\ 

•I##  length  of  bufs  written  to  network  (default 
-s  don't  source  a  pattern  to  network,  use  stdin\n\ 

•n##  number  of  bufs  written  to  network  (•$  only,  default  1024)\n\ 

-p##  port  number  to  send  to  (default  20(X)ll\n\ 

-u  use  UDP  instead  of  TCPaX 

Usage:  ttcp  -r  [-options]  >oufvn\ 

AM  length  of  network  read  buf  (default  1024)\n\ 

-s  sink  (discard)  all  data  from  networkSn\ 

-pM  port  number  to  listen  at  (default  2000)\n\ 

-B  Only  output  full  blocks,  as  q>ecified  in  -1##  (for  TAR)\n\ 

-u  use  UDP  instead  (rfTCPSnX 


char  stats[128]; 
doublet; 
long  nbytes; 
int  b_fl^  s  0; 
void  prq)_timer(): 
double  iead_timer(); 
double  cput.  realt; 
main(argc.argv) 
intargc; 
char  **argv; 

[ 

unsigned  long  addr.imp; 

if  (argc  <  2)  goto  usage; 


/*  tiansmisrion  time  */ 
r  bytes  on  n«  •/ 

/*  use  mreacK)  */ 


/*  user,  real  time  (seconds)  */ 


argv++;  argc~; 

while(  argoO  &&  argv[01[0]  »=  v )  ( 
switch  (argv[0][l])  ( 
case  B': 


b_flag=  1; 
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break: 
case  V; 

trans«  1; 
break: 
case  ’r': 

transsQ; 

break; 

case'd': 

options  b  SO.DEBUG; 
break: 
case’n’: 

nbuf = atoi(&argv(0][2]); 
break; 
case  1': 

buflen  s  aioi(&ai]gv[0][2]); 
break; 
case  V: 

windows!; 

sendwin  s  1024  *  atoi(&argv[0][2]): 
rcvwin  s  1024  *  at«(&argv(0]{2]): 
break: 
case  ‘s': 

sinionode  s  i;^  source  or  sink,  really  */ 
break; 
case'p': 

pon  s  atoi(&argv[0][2]); 
break; 
case  'u': 
udp»  1; 
break; 
default: 

goto  usage; 

I 

argv++;  argc-; 

) 

ifi(trans)  { 
f*  xmiir*/ 

if  (aigc  !s  1)  goto  usage; 
bzero((char  *)&sinlum,  sizeoffsinhim)); 
host  s  vgv[0]; 
if  (attridiost)  >  0 )  I 
/*  Numeric  */ 

sinhim.sin_famyy  s  AF.INET; 

#ifdefcray 

addr.tmp  s  inet_addr(host); 
siiilum.sin_addr  s  addr.tnqi; 

#else 

sinliim.sin_addr.s_ad(k  *  iiiet_addr(host); 

#endif 
)  else  ( 

if  ((addrsgethostbynamefhost))  sk  NULL) 
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eirCbad  hostname*); 
sinhim^in_fami)y  =  addr->h_additype: 
bcopy(addr->h_addr,(char*')&addr_tmp.  ad<lr->h_lcngth); ' 

#ifdef  Cray 

sinhim^in.addr  >  addr.tmp: 

#else 

sinhim.sin_addr^_addr  s  addr.tmp: 

#endif  cray 

I 

sinhim.sin_pun  »  htons(port); 
sinme.sin_poit  =  0;/*  free  choice  */ 

I  else  I 
r  rcvr*/ 

sinine.sin_port  s  htons(port): 

I 

if(  (buf  as  (char  *)maUoc(buflen))  as*  (char  *}NIJLL) 
en("malloc"); 

fprintf(stderr.”itcp%s:  nbu£B%<l,  buflens%d.  pons%<Ni". 

irans?"-t":"-r", 

nbuf.  bufkn.  port): 

if  ((fd  =  sockeK  AF.INET.  u4)?SOCK_DGRAM:SOCK_STREAM.  0))  <  0) 

efr("socket"); 

mes("sockBt"): 

/*  Try  the  getsockopt  &  setsockopt  to  Solaris  here  */ 

#ifhdef  SOLARIS 

if  (bind(fd,  &sinme,  sizeof(sinine))  <  0) 
crr("bind"); 

#eise 

r 

*  Under  Solaris,  calling  connectQ  on  a  strean  socket  binds  the 

*  .socket  to  an  address.  If  a  bind()  is  done  before  the  conneaO. 

*  an  entN’  "connect:  Address  family  not  suppcnted  by  protocol  family” 

*  results.  Only  call  bind()  for  the  cases  where  you're  not  going 

*  to  call  connect(). 

*/ 

if  (u^  II  (!udp  &&  Itrans) ) 

if  (bind(fd.  (struct  sockaddr  *)  &sinme,  sizeof(sinme))  <  0) 
enCTjind"); 

#endifr  SOLARIS  •/ 
if(!udp)  { 
if  (trans)  ( 

/*  We  are  the  client  if  transmitting  */ 
if(options)  I 
#ifdefBSD42 

if(  setsockopt(fd.  SOL_S(XrKET.  options.  0. 0)  <  0) 

#elseBSD43 
#iftidef  SOLARIS 

if(  setsockopt(fd.  SOL.SOCKET.  options.  &one.  sizeof(one))  <  0) 

#else 

if(  setsockopt(fd,  SOL  SOCKET,  options,  (char  *)  &one.  sizeof(one))  < 

0) 
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#endil  r  SOLARIS  •/ 

#endil' 

en’("seu>ockopl"); 

I 

tifndef  SOLARIS 

if(connect(fd,  &siiihiin,  sizeuf<siiihim) )  <  0)  | 

#else 

if(connect(fd.  (struct  sockaddr  *)  &sinhiin.  sizeof(sinhiin) )  <  0)  ( 

#endif/*  SOLARIS  */ 
err("connect"); 

1 

mesCconnect");  « 

if(  window)! 

if  (setsockopt  (fd.  SOL.SOCKET.  SO_SNDBUF.  (char  *)  &sendwin. 
sizeof(sendwin))  <  0 ) 

printfCget  send  window  size  didn't  workNn**): 
if  (setsockopt  (fd.  SOL.SOCKET,  SO.RCVBUF.  (char  *)  &icvwin. 
sizeof(rcvwin))  <  0 ) 

printfCget  rev  window  size  didn't  worl^”); 

if  (getsockopt  (fd.  SOL.SOCKET.  SO.RCVBUF.  (char  *)  &sendwin.  &optlen)  <  0 ) 
printfCget  send  window  size  didn't  wort^n"): 
else  printfCsend  window  size  *  sendwin): 

if  (getsockopt  (fd.  SOL.SOCKET.  SO.RCVBUF.  (char  *)  &icvwin.  &opt)en)  <  0 ) 
printfCget  rev  window  size  didn't  work^n”); 
el%  printfCreceive  window  size  ^  %<Ni''.  tcvwin); 

1 

)else  { 

/*  otherwise,  we  ate  the  server  and 
*  should  listen  for  the  connecnons 

*/ 

«ifhdef  SOLARIS 

listen(fd.O);  Tallow  a  queue  of  0*/ 

#else 

T 

*  Under  Solaris,  specifying  a  queue  length  of  0 

*  results  in  a  "connection  lefiiMd". 

V 

listen(fd.l); 

#endif  T  SOLARIS  V 
if(options)  { 

#ifdefBSD42 

if(  setsockopt(fd.  SOL.SOCKET.  options,  0, 0)  <  0) 

#eIseBSD43 
«ifhdef  SOLARIS 

if(  setsodtoptffd,  SOL.SOCKET,  options,  &one,  sizeof(one))  <  0) 

#else 

if(  setsodcopt(fd.  SOL_SOCKET.  qitions,  (char  *)  &one,  sizeof(one))  < 

0) 

«endifT  SOLARIS  */ 

#endif 

eir("setsockopt"); 
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I 

fromlen  s  sizeof(frominet); 
domain  «  AF  INET: 
tifndef  SOLARIS 

if((fdsaccq)t(f(L  &firomjnet.  &fromien) )  <  0) 

#else 

if((fdsaccept(fd.  (struct  sockaddr  *)  Afiromineu  &froinlen) )  <  0) 
tendif/*  SOLARIS  •/ 
erK'accept"); 
mes("accept"); 
if  (window)! 

if  (setsockopt  (fd,  SOL.SOCKET,  SO.SNDBUF,  (char  *)  &sendwin, 
sizeof(sendwin))  <  0 ) 

printf(”get  send  window  size  didn't  wori^*): 
if  (setsockopt  (fd,  SOL_SOCKET.  SO_RCVBlJF,  (char  *)  &Tcvwin, 
sizeof(rcvwin))  <  0 ) 

printfCget  icv  window  size  didn't  wori!^n'‘): 

if  (getsockopt  (fd,  SOL_SOCKET,  SO_RCVBlJF,  (char  *)  &sendwin,  &opden)  <  0 ) 
printfCget  send  window  size  didn't  worl^"); 
else  printfCsend  window  size  s  %dVn''.  sendwin); 

if  (getsockopt  (fd,  SOL_SOCKET,  SO.RCVBIJF.  (char  *)  &n:vwin.  &optlen)  <  0 ) 
printfCget  rev  window  size  didn't  woil^"); 
else  printfCreceive  window  size  =  rcvwin): 

I 

I 

I 

prq)_timer(); 
eimosO: 
if  (sinkmode)  { 
register  int  ent; 
if(trans)  ( 

pattem(  buf,  buflen ); 

if(udp)  (void)Nwrite(  fd,  buf,  4 );  /•  icvr  start  */ 
while  (nbuf-  &&  NwritB(fd,buf,bufIen)  n  buflen) 
nbytes  buflen; 

if(udp)  (void)Nwrite(fd,buf,4);/*rcvrend*/ 

)  else  { 

while  ((cntBNread(fd.buf,bufIen))  >  0)  ( 
static  int  gtring  s  O; 
if(  ent  <s  4  )  { 
if(  going ) 

break:/*  ’TEOF"  •/ 
goings  1; 
prep_timer(); 

I  else 

nbytes  +«  enu 
I 
I 

}  else  { 

register  int  ent; 
if(trans)  ( 
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whiic((cni»read<U.but.bullen))  >  0  &&. 

Nwriu;(f<Lbuf.cnt)  ^  cnt) 
nbytes  +*  cm; 

)  else  { 

while!  (cnt3Nread(fd.buf.buflen))  >  U  && 
write!  1  .bufxnt) »  cnt) 
nbytes  +»  cnt; 

I 

I 

if!ernio)  eiT("10"); 

!  vokl)read_timer!stats.si74iof(stats)): 
if(udp&&tnins)  1 

(void)Nwrite(  fd,  buf.  4 );  f*  rcvr  end  */ 

(void)Nwrite!  fd,  buf,  4 );  /•  rcvr  end  •/ 

(vokDNwrite!  fd,  buf,  4 );  /*  rcvr  end  */ 

( void)Nwrite!  fd,  buf,  4 );  /•  rcvr  end  */ 

I 

f^tfCstdout, 

"ttcp%s:  %kl  bytes  in  %2f  real  seconds  »  %2[  KB/sec  »  %.4f  Mb/^", 
irans?"-t":"-r", 

nbytes,  realt,  ((double)nbytes)Aealt/1024. 
!(double)nbytBs)/realt/128000 ): 

if  (verbose)  ( 
fjnnifCstdout, 

''Qcp%s:  %id  bytes  in  %,2f  CPU  seconds  «  %J2f  KB/cpu  sec\n*, 
iiaiis?"-t":"-r". 

nbytes,  cput,  ((double)nbytes)/cput/lQ24  ); 

» 

exit(0); 

usage: 

fprintf(stden',Usage); 

CMt(l); 

I 

en(s) 

char*s: 

{ 

fprintf(stderr,"ttq>%s: irans?"-t":"-r"); 
penrorCs); 

fpiintf(stden."emiOB%(Ni’'xnno): 

e*it(l); 

I 

ines(s) 

char*s: 

fimntf(siderr.”ttcp%s:  trans?"-t":"-r",  s); 

I 

pattenK  cp,  cnt ) 
r^istn'cliar*cp; 
roister  imcnq 
{ 

register  chare; 
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c  »0; 

while(  cm-  >  0 )  | 

whiJe(  !isprim((c&0x7F)) )  C++; 

•cp++  =  (c++&(hi7F); 

I 

I 

/*******  timing  •••*•••••/ 

#ifdefSYSV 
extern  long  dmeO: 

#if  sgi 

static  void  tvsub(); 

static  stracttimeval  dmeOy*  Time  at  which  dmeing  aaned  */ 
#else 

static  long  timeO: 

#endif 

static  struct  tms  tmsO; 

#else 

static  structtimeval  timeO;/*  Time  at  which  limeing  started  V 

static  structrusage  ruO:/*  Resource  utilization  at  the  start  */ 

static  void  prusage(): 

static  void  tvaddO: 

static  void  tvsub(): 

static  void  psecs(): 

#endif 

r 

*  PREP.TIMER 

•/ 

void 

prep.timerO 

{ 

tifdefSYSV 

fifsgi 

gettimeofday(&timeO,  (struct  dmezone  *)0); 

#else 

(void)time(&timeO); 

#endif 

(void)tinies(&tmsO); 

#else 

gettimeofday(&timeO,  (struct  timezone  *)0); 
getrusage(RUSAGE_SELF,  &iuO); 

#endif 

I 

r 

*  READ.TIMER 

* 

V 

double 

Tead_timer(strJen) 
char  *str. 

{ 

#ifdefSYSV 
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long  ntiw; 
struct  ims  unsnow: 
char  line(132]: 

#ifdef  sgi 

struct  timeval  timedol; 
struct  timeval  td: 

getdmeofdayi&timedol  (stnict  timezone  *)U); 
tvsutK  &td.  &timedol.  &timeO ): 
realt  =  td.iv_sec  +  ((double)td.tv_usec)  / 1000000; 
#eise 

(void)dme(&now); 
real!  -  now-timeO; 

#endif 

(void)tiines(&tnisnow); 

cput  s  unsnow.uns_utime  -  tmsO.ons.utime: 

cput  /s  HZ; 

if(  cput  <  0.00001  )  cput  -  0.01; 
if(  r^t  <  0.00001 )  realt  =  cput; 
sprintf(line,"%g  CPU  secs  in  %g  elapsed  secs 
cput,  realt. 
cputAealt*100); 

(void)stincpy(  str.  line,  len ); 
rettim(  cput ); 

#else 

/*BSD*/ 

stnict  dmeval  tunedol; 
struct  nisage  rul; 
stnict  timeval  td; 
struct  dmeval  tend,  tstart; 
char  line[132]; 

getrusage(RUSAGE_SELF,  &nil); 
getiiineofday(&time(k>l,  (stnict  timezone  *)0); 
prusage(&ni0,  &rul,  &timedol,  &time0,  line); 
(void)stmcpy(  str,  line,  len ); 

/*  Get  real  lime  */ 

tvsub(  &td.  Admedol.  &time0 ); 

realt » td,tv_sec  +  ((double)td.tv_usec)  / 1000000; 

/*  Get  CPU  time  (user-fsys)  */ 

tvadd(  &lend.  &nil  ju.utime,  &nil.tu_stime  ); 

tvadd(  &tstart.  &ni0.ni_utime.  &niOju_stime ); 

tvsub(  &td,  &iend,  &tstan ); 

cput » td.tv_sec  +  (((kMible)id.tv_usec)  / 1000000; 

if(  cput  <  0.00001 )  cput  s  0.00001; 

retum(  cput ); 

#endif 

1 

#ifhdefSYSV 
static  void 

prusage(fO.  rl.  e,  b,  outp) 
register  stnia  nisage  *iO.  *rl; 
struct  timeval  *e.  *b; 
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char  *outp: 


struct  timeval  tdiff: 
register  tiine_t  t; 
register  char  *cp: 
register  int  i; 
int  ms; 

t  *  (rl->ru_utiiiie.tv_sec-iO->ru_utime.tv_sec)*l(X)4- 
(rl->nj_utiine.tv_usec-i0->ni_utifne.tv_usec)/10000+ 

(r  1  ->ni_stime.tv_sec-iO->ni_stime.tv_sec)*'  100+ 

(rl  ->ni_stiiiie.tv_usec-i0->ru_stinie.tY_usec)/10000: 
ms  =  (e->tv_sec-b->tv_sec)*100  +  (e->tv_usec-b->tv_usec)/10000; 

#define  END(x){whi]e(*x)  x++;) 

cp  =  "%Uuscr  %Ssys  %Ereal  %P  %Xi+%Dd  %Mmaxiss  %F+%Rpf  %Ccsw"; 
for(;*cp;cp++)  { 
if(*cp  !*  •%•) 

*outp++  *  *cp; 
else  if  (cp[l])  switch(*++cp)  | 
case  XT: 

tvsub(&tdiff.  &rl->ni_utiiiie,  &iO->ni_utime); 
sprintf(ouip,"%d.%01d‘*.  tdiff.tv_sec,  tdiff.tv_usec/100000); 

END(outp): 

break: 

case's*: 

tvsub<&tdifr,  &rl->ni_stinie,  &iO->ni_slime): 
sprifitf(outp.''%d.%01d",  tdiff.tv_sec.  tdiff.tv_usec/100000): 

END(outp): 

break: 

case'E’: 

psecs(ms  / 100,  outp): 

END(ouq)): 
break: 
case  'P: 

spfintf(outp."%d%%",  (int)  (t*  100  /  {(ms  ?  ms :  1)))): 

ENIHoutp): 

break: 

case’W: 

i  =  rl->ni_nswap  -  fO->ru_nswap: 
spiintf(outp,''%d'',  i): 

END(outp): 

break: 

case'X’: 

sprintf(outp,''%d'',  t  **  0  ?  0 :  (rl->ru_ixrss-iO->ru_ixrss)A); 

END(outp): 

break: 

case'D': 

sprintf(outp."%d'',  t ««  0  ?  0 : 

(rl->ru_idrss+rl->ra_isrsS'<iO->ni_idrss+rO->ni_isrss))A): 

END(outp): 

break: 

case’K': 
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sprintf(outp."^.(I”.  t  **  0  ?  0 : 
((rl->ni_ixrss+rl-i»nj_isrss+rl->ni_idrss)  - 
(il)->ni_ixTSs+rO->nj_idrss+rt)->nj_isrss))/t); 
END(outp); 
breiik; 
case  ’M’: 

sprintf(outp."%d".  rl->ni_nmrss/2); 

END(outp): 
break: 
case  F: 

sprintf(outp.''%d''.  rl  ->ni_majfIt-iO->ni_inajflt); 
END(outp); 
break; 
caseH': 

qnifitf(outp.'‘%d‘'.  rl->ni_ininflMD->ru_nunflt): 
END(ouq)): 
break; 
case  T: 

q>rintf(ouq),"^‘',  rl->ru_inbkxk-iO->ni_inblock): 
ENIXoutp): 
break: 
case 'O': 

qKiiitf(outp,''%d''.  rl-Mu_oublock*iO'>ni_oubIock): 
END(oaip): 
break: 
care'C: 

^ninif(oiiq>.''%d+%d''.  rl->ni_nvcsw-iO->ni_nvcsw, 
rl->ru_nivcsw-iO->ni_nivcsw  y, 

ENIKouqi): 

break: 

I 

I 

•ou^ = \)': 

\ 

static  void 
tvaddCtsum,  tO,  tl) 

stnict  tiineval  *tsuin.  *t0,  *tl: 

{ 

tsuin->tv_sec  =  iO->tv_sec  +  tl->tv_scc: 
tsuin->tv_usec  =  tO->tv_usec  +  tl->tv_usec: 
if  (tsiini->tv_usec  >  1000000) 
tsum-xv  sec++,  isuni->tv_usec  -«  1000000: 

1 

static  void 
tvsub(tdiff.  tl,  lO) 

stnict  ttmeval  *tdiff,  *tl,  *t0: 

I 

tdiff-xvjsec  » tl->tv_sec  -  i0->tv_scc: 
tdiff-xv.usec  ■  tl->tv_usec  -  l0->tv_asec: 
if  (tdi£r->tv_usec  <  0) 
tdifr->tv_s6C~,  tdiff'Xv.iisec  1000000: 


84 


I 

static  void 
psecsdxp) 
kmgl: 

roister  char  *cp; 

I 

register  int  i; 
i>l/3600; 
if  (i)  { 

sprintf(cp.”%d:".  i): 

ENIXcp): 
i « I  %  3600. 

sprintf(cp."%d%<l".  (i/M))  /  10.  (i/60) »  10); 

END(cp); 

)  else  I 
i*  1; 

sprintf(cp.*%d*.  i  /  60); 

END(cp): 

I 

i  %*60; 

•cp++.r; 

spriiitf(cp.’%d%<r,  i  / 10.  i  %  10); 

I 

iendif 

r 

*  NREAD 

•/ 

Niead(  fd.  biif.  count ) 

I 

struct  sockaddr.in  from; 
int  len  «  sizeofrfrom); 
roister  int  cnt; 
tfrudp)  ( 

cnt »  recvftonK  fd.  (char  •)  buf.  count.  0.  (struct  sockaddr  *)  &fiom.  &len ); 
)  else  I 
iR  b_fiag ) 

cnt «  nuead(  fd.  buf.  count  )y*  fill  buf  */ 
else 

cnt  s  read(  fd.  buf.  count ); 

I 

reoiin(cnt); 

) 

r 

•  NWRITE 

•/ 

Nwrite(  fd.  buf.  count ) 

1 

register  int  cnt; 
iRudp)  ( 
again: 

cnt «  sendlo(  fd.  (char  *)  buf.  count  0.  (struct  sackaddr  *)  ftshihim. 


sizeof(sinhiin) ): 

iR  cni<0  eimo  ■=  ENOBUFS  )  t 
deliiy(lKU()U); 
ernio  =  0; 
goto  again; 

I 

I  else  I 

cnt  =  wrilei  fd  .  uunt ); 

I 

n»um(cnt); 

I 

dday(us) 

1 

struct  timeval  tv; 
tv.tv_sec  =  0; 
tv.tv_usec  =  us: 

(void)select(  1.  (fd.sct  *)0,  (fd.sct  (fd.set  •)0.  &tv ); 
retunKD: 

I 

r 

*  MREAD 

* 

*  This  function  perfonns  the  function  of  a  readfll)  bu'  will 

*  call  readdi)  multiple  times  in  order  to  get  the  requested 

*  number  of  characters.  This  can  be  necessary  because 

*  network  connections  don't  deliver  data  with  the  same 

*  grouping  as  it  is  written  with.  Written  by  Robert  S.  Miles,  BRL. 
*/ 

int 

mreadffd,  buip.  n) 
intfd: 

register  chai^bufp: 
unsignedn; 

{ 

roister  unsignedoount  a  9; 
register  inouead; 
do  { 

nread  s  readffd,  bufp,  iHcoum); 
ifi[nread<0)  { 

perrorCticp.iniead''); 

tetum(-l); 

) 

if(nread«BO) 

tetuni((int)count); 
count  (unstgned)nread; 
bufp  nread; 

)  wldlefcoum  <  n); 
retunt((int)count); 

I 

#ifsgi 
static  void 
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tvsubddiff,  tl.  lO) 

struct  timeval  •tdiff.  ‘tl.  ‘tO: 

\ 

tdiff->tv_sec  » tl->tv_sec  -  t()->tv_sec; 
t<lifT->tv_uscc  as  tl->tv_usec  -  tO->tv_usec: 
if  (tdifr->tv_usec  <  0) 
tdiff->tv_sec-.  tdiff->tv  usee  +=  1000000; 
I 

#endif 
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APPENDIX  B:  RCP  PROGRAM 


#inclu<le  <stdio.h> 
iinclude  <sys/time.h> 

main  () 


kmg  eiapsed.sec.  /*  Seconds  variable  */ 
elap^_usec;  t*  Microseconds  variable  */ 

im  file.size: 

float  total.time. 
part.usec, 
oansfer.iaie; 

float  aveiage.tinie  »  0; 

int  ioop.couniier, 

a,  /*  Subroutine  result  variables  */ 

b: 

int  n«S: 

char  name[30],  systein_nanie[30]; 

char  tq>_siring(3()] « "rep”; 

dur  bl^_stfing(2] «  " 

im  trues  1: 

char  answer(2]; 

char*  get_naine(char  ^string); 


r  Variable  structure  defhs  */ 


struct  timeval  timestatt  tiinedone; 
struct  timezone  zonesiart.  zonedone; 

/*  Get  file  name &Dest  machine  name  &  path  */ 

printf("\n\nVi  Here  is  a  list  of  availbie  files  for  tianslering:  >n\n”); 
system  Cb  -ai"): 

while(answer(0] !« ’y*) 

{ 

printfCNi  Input  the  file  name  to  be  transfered:  \ii\n”); 
getsfname); 

printfCNi  Is  the  below  input  oonect?  Enter  y  if  yes  or  n  if  inoonecc  VnVn”); 
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puls(naine); 

prinif("Nn"); 

gets(answer): 

} 

answer[0]  =  'n';  /*  reset  for  next  loop  •/ 

r  Get  file  size  */ 

while(answer(0]  !*  'y') 

{ 

printf(”\n  Input  the  file  size  to  be  transfered:  VnNn**); 
scanf(''%d".  &file_size); 

printfCNt  Is  the  below  input  coirect?  Enter  y  if  yes  or  n  if  inconecc  ViNn”): 
printf("%dVn".  file_size): 
getsfanswer): 
gets(answer): 
answerfO]  =  y ; 

I 


answeifO]  s  *n‘:  /*  reset  for  next  loop  */ 


while(answer[01 !» 'y*) 

I 

printfCNi  Input  the  Dest  machine  name  &  path  to  he  transfered:  VnNn"): 
printf(*An  example  would  be:  gold*fddi:AlsrAest/wtog_tes^n^n"); 
gets(system_name): 

printfC^n  Is  the  below  input  coirect?  Enter  y  if  3res  or  n  if  incoirect:  \n\n*); 

puts(system_name); 

printfOn"); 

getsfanswer); 

I 

strcai(rcp_string.  blank_string); 
strcat(rcp_string.  name); 
strcat(rcp_stnng.  blank_string); 
stxcat(iq>_string.  system.name): 

r  Sti  up  outer  loop  to  execute  iransfins  n  times  V 

for  (loop.counter  s  1;  loop.counter  <»  n;  loop.counter I) 

I 

r  Get  start  time  in  sec&usec  and  check  if  successful  */ 
a  s  gettimeofdayf&tiinestart.  zonestart): 
if  (a !«  0) 

printf  ("Oops !  %<Nt",  a): 

/*  Use  system  call  to  do  file  transfer  *f 
system  (icp_string): 

f*  system  ("icp  americanjikiiu  gold-fddi:Aisr/test/wtog_test");  */ 

I*  Get  stop  time  in  sec&usec  and  check  if  successful  */ 
h  s  gettimeofdayf&timedone.  zonedone); 
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if(h  !>()) 

printf  ("CXjps!  %dSn".  b): 

/*  Get  structure  values  fcM-  calculadons.  */ 
elapsed_sec  « tiniedone.tv_sec  -  timestarLtv.sec: 
elapsed.usec  « tiniedone.tv_usec  •  timestan.tv_usec: 
r  Make  sure  that  we  account  for  the  usee  */ 

/*  variable  nx)Ung  over  (through  zero)  */ 
if  (elapsed  sec>«  1 ) 

I 

if  (elapsed_usec  <  0) 

I 

elapsed.sec  -« 1; 
elapsed.usec  -fs  1000000: 

1 

I 

r  Convert  the  usee  variable  to  a  floating  point  number.  */ 
pan.usec  «  elapsed_usec/1.0e6: 

r  Add  the  seconds  to  the  microseconds  to  get  a  real  number*/ 
total_time  «  elapsed_sec  *  pan_usec: 

r  And  print  the  results  on  the  CRT*/ 
printf  ("%f  \t%fNn‘‘.  totaI_time.  ((file_size*S/ltotal_timeV1000000)); 
average.time  •*  total_tiine: 


/*  Print  out  the  results  of  the  avg  transfer  rate  */ 

printf(”\n\nls  this  time  correct?  %r,  average.time); 

printffViThe  average  time  was  %f  and  the  average  transfer  rate  was  %^n”,  average_time/h. 
((file_size*8/total_time)/1000000)); 

r  This  is  the  end  of  the  control  loop.  */ 
exit(0); 

) 
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APPENDIX  C:  NEAL  NELSON  BENCHMARK  RESULTS 


TABLE  9:  CPU  SUBSYSTEM 


GOLD2.SOL 

White 

Gold 

CPU  Type 

Sparc 

Sparc 

CPU  Qock  Speed 

45  MHz 

50  MHz 

Total  Size  of  Main  Memory 

224  Mbytes 

224  Mbytes 

Speed  of  Main  Memory  Chips 

80  ns 

80  ns 

IVpe  and  Speed  of  Math  Coprocessor 

None 

None 

Number  of  Main  CPUs 

2 

2 

TABLE  10:  DISK  SUBSYSTEM 


White 

Gold 

Total  Number  of  Disk  Controllers 

1 

1 

Total  Number  of  Disk  Devices 

2 

2 

Disk  Drive  Type 

SCSI 

SCSI 

Disk  Drive  Brand/Model 

Seagate 

Seagate 

Disk  Average  Seek  Hme 

Seagate  ST11200 

Seagate  STI480 

1- 10.5ms 

1- 10.5ms 

2-10.5  ms 

Does  system  have  I/O  buses  separate  from  the 
main  bus? 

Yes 

Yes 

91 


TABLE  11:  CACHE  INFORMATION 


White 

Gold 

Does  the  system  have  instruction  or  data  cache? 

Yes 

Yes 

How  many  levels  of  instruction/data  cache  are 
there? 

2 

2 

How  is  cache  coherency  accomplished? 

Snooping 

with 

invalidation 

Snooping 

with 

invalidation 

Does  CPU  have  separate  instruction  and  data 
caches? 

Yes 

Yes 

Total  size  of  all  instructions/data  caches: 

On-board  Instmction 
Data 

(Note:  External  SuperCache  controller  provides  1 
Mbyte  external  cache) 

20  Kbytes 

16  Kbytes 

20  Kbytes 

16  Kbytes 

Total  swap 

approx  280 
Mbytes 

approx  280 
Mbytes 

Group  1:  Tests  a  of  mix  of  activities  that  are  intended  to  approximate  the  processing 
activities  for  the  following  five  types  of  users.  Group  1  includes  the  following  tests: 


1)  Simulated  Office  Automation  Workload 

2)  Simulated  Database  Workload 

3)  Simulated  Software  Development  Workload 

4)  Simulated  Transaction  Processing  Workload 

5)  Simulated  Calculation  Workload  (Math/Statistics/CAD/CAM) 

Group  2:  Tests  designed  to  perform  various  types  of  calculation  tasks  and  thereby 
profile  the  performance  of  the  computer’s  calculation  subsystem.  Group  2  includes  the 
following  tests: 
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6)  Write  to  Shared  Memory 

7)  Read  from  Memory,  Small  Instruction  Area,  Small  Data  Area 
X)  Read  from  Memory,  Small  Instruction  Area,  Larger  Data  Area 

9)  Read  from  Memory,  Larger  Instruction  Area,  Small  Data  Area 

10)  Read  from  Memory,  Larger  Instruction  Area,  Larger  Data  Area 

11)  Make  Machine  Page  or  Swap  with  *malloc’  and  ‘free’ 

12)  Combined  Integer  and  Floating  Point  Math 

13)  Math  Library  Functions 

14)  Semaphores,  Shared  Memory,  Context  Switch 

15)  Write  to  and  Read  from  Pipes,  Context  Switch 

16)  Sample  System  Calls 

17)  Increasing  Depth  of  Function  Calls 


Group  3:  Tests  that  perform  a  series  of  disk  input  and  output  functions  to  profile  the 
performance  of  the  disk  subsystem.  Group  3  includes  the  following  tests: 


18)  1024  byte  Sequential  Reads  from  Unix  File(s) 

19)  1024  byte  Sequential  Writes  from  Unix  File(s) 

20)  8192  byte  Sequential  Reads  from  Unix  Files(s) 

21)  3192  byte  Sequential  Writes  to  Unix  Rle(s) 

22)  4096  byte  Synchronized  Reads  from  Unix  Rle(s) 

23)  4096  byte  Synchronized  Reads  from  Raw  Device(s) 

24)  16384  byte  Synchronized  Reads  from  Unix  Rle(s) 

25)  16384  byte  Synchronized  Reads  from  Raw  Device(s) 

26)  4096  byte  Pseudo  Random  Reads  from  Unix  File(s) 

27)  4096  byte  Pseudo  Random  Reads  from  Raw  Device(s) 

28)  Profile  Disk  Cache  for  Unix  Rle(s) 

29)  Profile  Disk  Cache  for  Raw  Device(s) 

30)  8192  byte  Sequential  Writes  then  ‘sync’ 
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(iold  Verses  White,  Two  Processors 


TABLE  12:  (i(>LD2.S()L  VRS  WHITE2.S()L,  TEST  1  &  2  &  3  &  4 


TABLE  13:  G0LD2^0L  VRS  WHITE2^0L,  TEST  5  &  6  &  7  &  8 


TABLE  14:  G()LD2.SOL  VRS  WHITE2i(OL,  TEST  9  &  10  &  11  &  12 


TABLE  16:  (;OLD2^()L  VRS  WHITE2.SOL,  TEST  17  &  18  &  19  &  20 


TABLE  18:  G0LD2^0L  VRS  WHITE2i»OL,  TEST  25  &  26  &  27  &  28 
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(tuld  One  Processor  Verses  (sold  Two  Processors  Results 


TABLE  20:  GULDliiOL  VRS  GOLDl^OL,  TEST  1  &  2  &  3  &  4 


TABLE21:  G0LD1^0LVRSG0LD2^L,TEST5&6&7&8 


TABLE  22:  GOLDl^OL  VRS  G0LD2^0L,  TEST  9  &  10  &  U  &  12 
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TABLE  26:  GOLDl^OL  VRS  G0LD2J50L,  TEST  25  &  26  &  27  &  28 


ACiir-iJ 

trroMii 

1  ■.nor- Ml 

■.rror-n 

(tOMl 

IITEL'-M 

nm 

1  S<cs 

1  Sees  1 

_ 

1  S«s  1 

1  S«cs  J 

u 

1  Sees  1 

Is^ 

M 

Sees 

1  Sees  1 

TABLE  Z7:  GOLDl^OL  VRS  G0LD2^0L,  TEST  29  & : 


Solaris  23  One  Prucessur  Verses  SunOS  4.13  One  Processor  Results 


TABLE  28:  G()LD13()L  VRS  GOLD13UN,  TEST  1  &  2  &  3  &  4 


li _ : 

[ _ _ ll 

m 

1  Sees 

"GST 

Secs 

Gold 

Secs 

GBId 

Secs 

□ 

"Gar 

Secs 

I'GoHr 
1  Secs 

L 

Secs 

■ 

. . ^ 

■KilH 

^KmJI 

lit 

177 

143“ 

ID 

— 331“ 

IIT"' 

"333 

vr 

“311 

“311 

4 

411 

'  344 

331 

311  '  ' 

3W 

292 

s 

m 

S14 

“347" 

311 

347 

— m — 

■  "n3“ 

2*7 

4 

in 

4M 

314 

3ir 

— SH — 

912 

. 341“ 

™2J2 

T 

117 

133 

IM"'  ■ 

'■■411  ' 

'  313 

”  3H 

1 

us 

M7 

131 

"  117“ 

333 

334"" 

A 

4<7 

1 

IM 

”117 

ttl 

492 

9U 

It 

list 

1B4 

I4K 

TBI” 

'  IH  "' 

341 

57V 

II 

I3U 

I3M 

'  I314 

”  1341” 

717'" 

7»S 

'  in 

. W 

IIU 

mr 

"W"' 

Nk 

733 

712 

U 

IlH 

1317 

nr 

mnim 

1W» 

ITW 

1113 

im 

”134 

131 

MS 

IlM 

13^ 

“3311” 

liiT 

W™ 

~  «94 

U 

jPI 

UlS 

3131 

ILLIH 

■UliM 

"  'US 

"m 

lll'l 

IM 

U43 

■ 

IW 

nn 

”3*41“ 

!» 

-non 

”  iw 

"TIB 

3!U 

3411 

14U 

itn 

"1131 

■tlXLB 

■■ 

■aijiai 

■auis 

TABLE29:  G0L0130L  VRSGOLD13UN,TEST5&4ft7&8 


TABLE  30:  GOLDl^OL  VRS  GOLDl^UN.  TEST  9  &  10  &  U  &  12 
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TABLE  34:  GOLD1.SOL  VRS  GOLDl^UN,  TEST  25  &  26  &  27  &  28 


APPENDIX  D:  NTTCP  SINIiLE  PROCESSOR  RESULTS 


table  36:  SINCsLE  PARAMETER  TEST  RESULTS 


”  TfflNilKBJP - 

IVMhi 

■■■■ 

■m 

|l 

■■  TiTT&l  ■ 

■■"■WRiTe" 

- 050 - 

- 8 - 

sms 

[  II 

2nd  Test 

8 

8ins 

HBtiiijBi 

050 

White 

8 

8ins 

48K  II 

CoB 

16 

8ms 

II  SRTeS 

■1^2!!iai 

16 

8ms 

1 

1 

050 

While 

16 

mmi 

Gold 

8 

6fns 

I^BiIiUiHI 

“■White 

8 

5ms 

HIKiiLJliB 

Co0 

White 

8 

5ms 

zn 

050 

16 

5ms 

^■Uism 

II  14th  Test 

^2^291 

16 

5ins 

48K 

■l^^jrrrs 

050 

White 

16 

m9i 

- 38K - 

WBti 

Gold 

8 

1155 

- 3SK - 

II  18th  Test 

8 

muliism 

mmiiimi 

HHHI  L'Ti  fTHP.  '^■■■1 

WI® 

Gold 

050 

White 

8 

mui 

48K 

II  21st  Test 

White 

050 

16 

Urns 

38K 

White 

16 

llms 

23tdTest& 

24th  Test 

Gold 

White 

16 

mm 

zz 

253773 - 

050 

8 

mmsDBi 

2313 

050 

White 

8 

2555 

2lth  test  k 
28lhTest 

050 

White 

8 

mm 

48K 

293713 

Gold 

16 

Z5m 

?8K 

503713 

White 

16 

25ms 

■KiL^H 

^KSi^iillll 

Wluti 

Gold 

Gold 

White 

16 

48K 

mm 

■19 

8 

8 

- 38K — 

050 

mHi 

8 

8 

38K 

106 


107 


108 


TABLE  43:  SINCLE  PROCESSOR,  7TH  TEST  RESULTS 
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TABLE  49:  SINGLE  PROCESSOR,  13TH  TEST  RESULTS 


(K  bytes)  I  Mbps  I  Mbps  I  Mbps  I  Mbps  Mbps 


■msul 

Mbps 

Mbps 

From:  White 
To:  Gold 


Thfeads:  16 
TTRT:  Sms 


LLC  Buffers:  48K 
Single  Test 


TABLE  50:  SINGLE  PROCESSOR,  I4TH  TEST  RESULTS 


■  anaM 

Mbps 

M 

Mbps 

~T€Tr~ 

5k 

■LaUfll 

From:  Gold 
To:  White 


Threads:  16 
TTRT:  Sms 


LLC  Buffers:  48K 
Single  Test 


TABLE  51:  SINGLE  PROCESSOR,  15TH  TEST  RESULTS 


imai::!.— ■ai:s»aMa':s3aMai:jM 

■liiaLrai 

1  Mbps 

Mbps 

Mbps 

Mbps 

Mbps  1 

From:  White 

Threads:  16 

LLC  Buffers:  48K 

To:  Gold 

TTRT:  Sms 

Dual  Test 
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TABLE  55:  SIN(?LE  PROCESSOR,  19TH  TEST  RESULTS 


■IKUI 

■IKSUi 

■  air-sraMairiiull 

I  (K  bytes) 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

From:  White 
To:  Gold 


Threads:  8 
TTRT:  11ms 


LLC  Buffers:  4gK 
Dual  Test 


TABLE  56:  SINGLE  PROCESSOR,  20TH  TEST  RESULTS 


(Kbjtas)  I  Mbps  I  Mbps  I  Mbps  Mbps  I  Mbps  I  Mbps  I  Mbps  I  Mbps 


1 3  ^  1 


From:  Gold 
To:  White 


Threads:  8 
TTRT:  11ms 


LLC  Buffers:  48K 
Dual  Test 


TABLE  57:  SINGLE  PROCESSOR,  21ST  TEST  RESULTS 


;-ir7nrtY.~#>Trrj 


faira 

■  ain.faira;u 

I  Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

From:  White 

Threads:  16 

LLC  Buffets:  4gK 

To:  Gold 

TTRT:  11ms 

Single  Test 
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TABLE  58:  SIN(;LE  PRCJCESSOR.  22ND  TEST  RESULTS 


(K  bytes)  |  Mbps  Mbps  |  Mbps  |  Mbps  |  Mbps  |  Mbps  |  Mbps  |  Mbps 


From:  Gold 
To:  White 


Threads:  16 
TTRT:  llms 


LLC  Buffers:  48K 
Single  Test 


TABLE  59:  SINGLE  PROCESSOR,  23RD  TEST  RESULTS 


(K  bytes)  |  Mbps  I  Mbps  I  Mbps  |  Mbps  |  Mbps  |  Mbps  |  Mbps  |  Mbps 


From:  White 

Threads:  16 

LLC  Buffers:  48K 

To:  Gold 

TIRT:  llms 

Dual  Test 

TABLE  60:  SINGLE  PROCESSOR,  24TH  TEST  RESULTS 


(Kbytes)  |  Mbps  I  Mbps  I  Mbps  1  Mbps  |  Mbps  |  Mbps  |  Mbps  |  Mbps 


AirrrirTr,>cfr7'j 


From:  Gold 

Hireads:  16 

LLC  Buffers:  48K 

To:  White 

TIRT:  llms 

Dual  Test 

TABLE  61:  SINGLE  PRdCESSOR,  25TH  TEST  RESULTS 


(K  bytwt  (  Mbp«  I  Mbp«  j  Mbp«  |  Mbps  j  Mbp*  {  Mbps  |  Mibp«  |  Mbp« 


Ffom:  White  Threads:  8  LLC  Buffers:  48K 

To:  Gold  TTRT:  25ms  Single  Test 


TABLE  62:  SINGLE  PROCESSOR,  26TH  TEST  RESULTS 


(Kbjrtts)  I  Mbps  I  Mbps  I  Mbps  Mbps  I  Mbps  I  Mbps  I  Mbps  I  Mbps 


From:  Gold  Threads:  8  LLC  Buffers:  48K 

To:  White  TTRT:  25ms  Single  Test 


TABLE  63:  SINGLE  PROCESSOR,  27Tr  ST  RESULTS 


i^ii  f  iT‘  /"-M  M  a  1  ■  a  i  ■  a  ■  a  ’  ■  a  r-i  ■  ar^Sw  a '  ^  i  aiilB  a  r^ra  ■  a  i  ;  ■ 


(Kbytes)  I  Mbps  i  Mbps  |  Mbps  |  Mbps  |  Mbps  |  Mbps  |  Mbps  |  Mbps 


Fiom:  White  Threads:  8  LLC  Buflers:  48K 

To:  Gold  TTRT:  25ms  Dual  Test 
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TABLE  64:  SIN(iLE  PROCESSOR.  28TH  TEST  RESULTS 


■LlilASa 

■mui 

■lilUi 

■lima 

1  Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps  1 

From:  Gold 
To:  White 


Threads:  8 
TTRT:  25ms 


LLC  Buffers:  4gK 
Dual  Test 


TABLE  65:  SINGLE  PROCESSOR,  29TH  TEST  RESULTS 


IlSm.j! 

■  j':i  ■ 

■11:12a 

FKF  1  TIMtr 

“pnnr 

(K  bytes) 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

From;  White 
To:  Gold 


Threads:  16 
TIRT:  25ins 


LLC  Buffets:  4gK 
Single  Test 


TABLE  66:  SINGLE  PROCESSOR,  30TH  TEST  RESULTS 


(Kbytes)  |  Mbps  I  Mbps  I  Mbps  I  Mbps  I  Mbps  J  Mbps  I  Mbps  I  Mbps 


From:  Gold 
To:  White 


Threads:  16 
TTRT:  25ms 


LLC  Buffets:  48K 
Single  Test 
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TABLE  «7:  SINGLE  PROCESSOR,  31ST  TEST  RESULTS 


“WlMH»WSrtr‘ 

HilS" 

-niiF 

"Fw'tr" 

"TBrE' ' 

■  a  :sai 

(K  bytes) 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

From:  White 

Threads:  16 

LLC  Buffos:  48K 

To:  Gold 

TTRT:  25ms 

Dual  Test 

TABLE  (8:  SINGLE  PROCESSOR,  32ND  TEST  RESULTS 


■  ainra 

loaK-iia 

Mbps 

Mbps 

Mbps 

5^ 

imi&im 

1  1 

25.3S 

1  2*^*2  1 

BmUH 

r  M-w  i 

HtULS 

\WtiAiM 

From:  Gold 
To:  While 


Threads:  16 
TTRT:  25ins 


LLC  Buffos:  48K 
Dual  Test 


TABLE  (9:  SINGLE  PROCESSOR,  33RD  TEST  RESULTS 


(K  bjrta)  |  Mbps  I  Mbps  I  Mbps  I  Mbps  I  Mbps  I  Mbps  I  Mbps  I  Mbps 


From:  White 

Threads:  8 

LLC  Buffos:  48K 

To:  Gold 

TTRT:  Sms 

Single  Test.  FDDI  Boards  Switched 

TABLE  70;  SlN(iLE  PRflCESSOR,  34TH  TEST  RESULTS 


ll— 

■  aiUlB 

■aiaji 

■  air^a 

(K  bytes) 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps 

Mbps  11 

4 

JITT 

BKIUM 

31TT 

SLU 

1U4 

""SUi" 

IS 

KlAiS 

mtijum 

■UAlfl 

11 

-52:77- 

IKUaM 

HbihSifli 

IK^^B 

SI 

HLbMfll 

HKImiJB 

IKUL^H 

mam 

IKI&BI 

Si 

HLLAB 

44 

KLUJI 

IHSUH 

■KSB 

ss 

■UI^M 

HUlfl 

HLm^B 

— 

■ ' 

Ffum;  Gold  Threads:  K  LLC  Buffers:  48K 

To:  White  TTRT:  8ms  Single  Test  FDD!  Boards  Switched 
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APPENDIX  E:  NTTCPTWO  PROCESSORS  RESULTS 


TABLE  71:  PARAMETERS  USED  FUR  TWO  PROCESSOR  TEST 


To: 

NFS  asjTBch 
t^nds 

TTkT 

sbr_BMm_ 

lk nt  ” 

sbf^mtu 

Isi  1'esi 

Gold 

8 

8ins 

2nd  Test 

White 

Gold 

16 

8nis 

48K 

4352 

3idTest 

White 

Gold 

8 

5ins 

48K 

4352 

4th  Test 

While 

Gold 

16 

5nis 

48K 

4352 

5th  Test 

White 

Gold 

8 

11ms 

48K 

4352 

6th  Test 

While 

Gold 

16 

llins 

48K 

4352 

7th  Test 

White 

Gold 

8 

25ins 

48K 

4352 

8di  Test 

White 

G<dd 

16 

25ms 

48K 

4352 

9th  Test 

White 

GM 

8 

8ms 

S6K 

4352 

10th  Test 

White 

Gold 

16 

8ms 

56K 

4352 

1 1th  Test 

White 

Gold 

8 

5ms 

56K 

4352 

12th  Test 

White 

Gold 

16 

5me 

56K 

4352 

13th  Test 

White 

Gold 

8 

11ms 

56K 

4352 

14th  Test 

White 

Gold 

16 

11ms 

56K 

4352 

15th  Test 

While 

Gold 

8 

25ms 

56K 

4352 

16lhTcst 

White 

Gold 

16 

25ms 

56K 

4352 

17th  Test 

White 

Gold 

8 

8m8 

40K 

4352 

18th  Test 

White 

Gold 

16 

8ms 

40K 

4352 

19th  Test 

White 

Gold 

8 

5ms 

40K 

4352 

20th  Test 

White 

Gold 

16 

5ms 

40K 

4352 

21st  Test 

White 

■HP 

8 

11ms 

40K 

4352 

22nd  Test 

White 

16 

11ms 

40K 

4352 

23th  Test 

White 

Gold 

8 

25ms 

40K 

4352 

24th  Test 

White 

Gold 

16 

25ms 

40K 

4352 

25th  Test 

White 

Gold 

8 

8ms 

48K 

4192 

26th  Test 

White 

Gold 

16 

8ms 

48K 

4192 

27th  Test 

White 

Gold 

8 

5m5 

48K 

4192 

28th  Test 

White 

Gold 

16 

5ms 

48K 

4192 

29th  Test 

White 

Gold 

8 

11ms 

48K 

4192 

TABLE  71:  PARAMETERS  USED  FOR  TWO  PROCESSOR  TEST 


sbf  miu 


34th  Test 


35th  Test 


36th  Test 


37th  Test 


38th  Test 


39th  Test 


40th  Test 


41st  Test 


42nd  Test 


43fd  Test 


44th  Test 


4Sth  Test 


46th  Test 


47th  Test 


48th  Test 


TABLE  72:  TWO  PROCESSORS,  1ST  TEST  RESULTS 


Mbps  I  Mbps  I  Mbps  I  Mbps  I  Mbps  I  Mbps  I  Mbps  I  Mbps 


From:  White 

Threads:  8 

LLC  Buffers: 

48K 

To:  Gold 

‘TIRT:  Sms 

MTU: 

4352  Bytes 
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TABLE  79:  TWO  PROCESSORS,  8TH  TEST  RESULTS 


ps  Mbps  Mbps  Mbps  Mbps 


From:  White 
To:  Gold 


Threads:  16 
TTRT:  25ms 


LLC  Buffers:  4gK 
MTU:  4352  Bytei 


TABLE  80:  TWO  PROCESSORS,  9TH  TEST  RESULTS 


Window  Sin 
(K  bytes) 


FUeB 

1  Mbps 

Mbps 

llt.33  60.07 


178.40  I  54.61 


163.84  49.13 


ps  I  Mbps  I  Mbps  I  Mbps  I  Mbps 


30.34  30.66 


50.84 


49J2  49.52  50.05 


50.24  49.25 


46.60  46.57 


39.44  35.62 


30.22  26.43 


From:  White 
To:  Gold 


Threads:  8 
TTRT:  8ms 


LLC  Buffers:  56K 
MTU:  4352  Bytes 


TABLE  81:  TWO  PROCESSORS,  lOTH  TEST  RESULTS 


Window  Sin 
(K  bytes) 


Mbps  I  Mbps  I  Mbps  Mbps 


From:  White 
To:  Gold 


Threads:  16 
TTRT:  8m.s 


LLC  Buffers:  .56K 
MTU:  4352  Bytes 


123 


Table  82:  two  processors,  iith  test  results 


TABLE  85:  TWO  PROCESSORS.  I4TH  TEST  RESULTS 


TABLE  88:  TWO  PKCKJESSOKS.  I7TH  TEST  RESULTS 


Mbps  Mbps 


From:  While 
To:  Gold 


Threads:  8 
TTRT:  Sms 


LLC  Buffers:  40K 
MTU:  4352  Bytes 


TABLE  89:  TWO  PROCESSORS,  18TH  TEST  RESULTS 


laTTif  ■aPT8BBa'ntBa:i.P3JBar^  Jl 


Mbps  I  Mbps  I  Mbps  |  Mbps  |  Mbps  |  Mbps 


2V.I3  I  27.3J 


30.95  I  30.37 


143.21  I  49.15 


56.110  49.52 


363.18  38.23  50.97  I  5Z43 


3177  49.15  58J5 


385.93  54.61  54.61 


137.14  47.33 


27.31 


47.33  I  41.87  |  46.60 


31.86  40.78 


5143  48.06 


54.61  35.49 


48.06  20J15 


46.60  I  15.49 


35.54  27.72 


Ian; 


G 

ps  I  Mbps 


14.27  I  11.82 


From:  White 
To:  Gold 


Threads:  16 
TTOT:  Sms 


LLC  Buffers:  40K 
MTU:  4352  Bytes 


TABLE  90:  TWO  PROCESSORS,  19TH  TEST  RESULTS 


Mbps  I  Mbps  I  Ml^  Mbps  I  Mbps  I  Mbps  I  Mbps  Mbps 


1017.63  I  5461 


240.30  49.15 


From:  White 

Threads:  8 

LLC  Buffers:  40K 

To:  Gold 

TTRT:  5ms 

MTU:  4352  Bytes 
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TABLE  91:  TWO  PROCESSORS.  20TH  TEST  RESULTS 


TABLE  94:  TWO  PROCESSORS,  23RO  TEST  RESULTS 


From:  White 
To:  Gold 


Threads:  16 
TTRT:  Sms 


LLC  Buffers:  48K 
MTU:  4192  Bytes 
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TABLE  103:  TWO  PRIK'ESSORS,  32ND  TEST  RESULTS 


(K  bytes) 


Mbps  I  Mbps  I  Mbps  |  Mbps  |  Mbps  |  Mbps  |  Mbps  |  Mbps 


From:  While 
To:  Gold 


Threads:  16 
TTRT:  25ms 


LLC  Buffers:  4gK 
MTU:  4192  Bytes 


TABLE  104:  TWO  PROCESSORS,  33RD  TEST  RESULTS 


(Kbytes) 


Mbps  I  Mbps  I  Mbps  I  Mbps  Mbps  I  Mbps  I  Mbps  I  Mbps 


From:  White 
To:  (Sold 


Threads:  8 
TTOT:  8ms 


LLC  Buffers:  56K 
MTU:  4192  Bytes 


TABLE  105:  TWO  PROCESSORS,  34TH  TEST  RESULTS 


(K  bytes) 


Mbps  I  Mbps 


Mbps  I  Mbps  I  Mbps  I  Mbps 


4aOS  I  3&67 


26JI  2S.9I 


SZ79  I  50.97 


50.97  I  52.43 


50.97  50.97 


50.97  1  50.97 


4l.r7  I  49.52 


35.32  I  39.79 


From:  White 
To:  Gold 


Threads: 

TTRT: 


LLC  Buffers:  56K 
MTU:  4192  Bytes 
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TABLE  IM:  TWO  PROCESSORS.  35TH  TEST  RESULTS 


TABLE  109:  TWO  PROCESSORS.  38TH  TEST  RESULTS 


TABLE  112:  TWO  PROCESSORS,  41ST  TEST  RESULTS 
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TABLE  115:  TWO  PROCESSORS,  44TH  TEST  RESULTS 


TABLE  118:  TWt)  PROCESSORS,  47TH  TEST  RESULTS 


TABLE  121:  TWO  PROCESSORS,  SOTH  TEST  RESULTS 


APPENDIX  F:  GLOSSARY  OF  TERMS 


802.2 


IEEE  standard  for  the  Logical  Link  Control. 


ACK 

ARP 

ANSI 

ARPA 

ARPANET 

ASIC 

asynchronous 

bandwidth 

beacon 

BER 

bps 

ccnr 

DARPA 


Acknowledge.  A  network  packet  acknowledging  the  receipt  of 
data. 

Address  Resolution  Protocol.  A  TCP/IP  protocol  to  translate  an  IP 
address  into  a  MAC  address. 

American  National  Standards  Institute.  A  private  organization  that 
coordinates  some  United  states  standards-making.  Represents  the 
United  States  to  the  International  Standards  Organization. 

Advanced  Research  Projects  Agency.  A  Department  of  Defense 
agency  that  has  helped  fund  many  computer  projects  including 
ARPANET,  the  Berkeley  version  of  Unix  and  TCP/IP.  ARPA  use  to 
be  known  as  DARPA. 

Advanced  Research  Projects  Agency  Network.  A  Department  of 
Defense  sponsored  network  of  military  and  research  organizations. 
Replaced  by  the  Defense  Data  Network  (DDN). 

Application-Specific  Integrated  Circuits. 

FDDI  term  far  data  transmission  where  aU  requests  for  service 
contend  for  a  pool  of  ring  bandwidth. 

The  amount  of  data  that  can  be  moved  through  a  particular 
communications  link.  FDDI  has  a  bandwidth  of  100  Mb/s. 

A  token  ring  packet  diat  signals  a  serious  failure  on  the  ring. 

Bit  Error  Rate. 

Bits  per  second.  'Dansmission  speed  over  some  media. 

Comite  Consuluu^  International  Tekgraphiqes  et  Telephonique 
(Consultative  Committee  for  International  Telephone  and 
Telegraph).  Standards-making  body  administered  by  the 
International  Telecommunications  Union. 

Defense  Advanced  Research  Projects  Agency.  See  ARPA. 
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DAS 

DDN 

DLL 

DMA 

DNS 

FDDI 

FTP 

ICMP 

IEEE 

IGMP 


Dual  Attached  Stations.  FDDI  term  for  a  node  that  is  attached  to 
both  the  primary  and  seco'-dary  fiber  optic  cables  (as  opposed  to  a 
node  that  is  connected  tc  le  ring  via  a  concentrator  or  not  dual 
attached. 

Defense  Data  Network.  A  network  for  the  Department  of  Defense 
and  their  contractors  based  on  the  TCP/IP  and  X.2S  networking 
protocols. 


Direct  Memory  Access.  This  is  a  device  (controller)  for  controlling 
the  trcnsfer  of  data  directly  to  or  from  the  memory  without 
invoh  g  the  processor.  The  DMA  controller  becomes  the  bus 
master  and  directs  the  reads  or  writes  between  itself  and  memory. 

Domain  Name  System.  A  mechanism  used  in  the  Internet  for 
translating  names  of  host  computers  into  addresses.  The  DNS  also 
allows  host  computers  not  directly  on  the  Internet  to  have  registered 
names  in  the  same  style. 

Fiber  Distributed  Data  Interface.  A  100  M/bs  fiber  optic  LAN 
standard  based  on  the  token  ring. 

File  Transfer  Protocol.  FTP  is  the  Internet  standard  for  file  transfer. 
FTP  was  designed  from  the  start  to  work  between  different  hosts, 
tuning  different  operating  systems  and  using  different  file 
structures.  RFC  959  is  the  official  specification  for  FTP. 

Internet  Control  Message  Protocol.  ICMP  is  often  considered  part 
of  the  IP  layer.  It  communicates  error  messages  and  other 
conditions  that  require  attention.  ICMP  messages  are  transmitted 
within  IP  datagrams.  RFC  792  contains  the  official  specification  of 
ICMP. 

Institute  of  Electronic  and  Electrical  Engineers.  A  leading  standard¬ 
making  body  in  the  United  States,  responsible  for  the  802  standards 
for  local  area  networks. 

Internet  Group  Management  Protocol.  IGMP  lets  all  the  systems 
on  a  physical  network  know  which  hosts  currently  belong  to  which 
multicast  groups.  This  information  is  required  by  the  multicast 
routers,  so  they  know  which  multicast  datagrams  to  forward  onto 
which  interfaces.  IGMP  is  defined  in  FRC  1112. 
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Internet 

IP 

ISO 

LAN 

LLC 

MAC 

Mbps 

MTU 

NAK 

NFS 

NIS 

NPI 

NRZI 

OSI 

PCM 


A  collection  of  networks  that  share  the  same  namespace  and  use 
the  TCP/IP  protocols. 

Internet  Protocol.  The  network  layer  protocol  for  the  Internet. 
International  Standards  Organization. 

Local  area  network.  Usually  refers  to  Ethernet  or  token  ring 
networks. 

Logical  Link  Control.  The  upper  portion  of  the  data  link  layer, 
defined  in  the  IEEE  802.2  standard.  The  logical  link  control  layer 
presents  a  uniform  interface  to  the  user  of  the  data  link  service, 
usually  a  network  layer.  Underneath  the  LLC  sublayer  of  the  data 
link  layer  is  a  Media  Access  Control  (MAC)  sublayer.  The  MAC 
sublayer  is  responsible  for  taking  a  packet  of  data  from  the  LLC 
and  submitting  it  to  the  particular  data  link  being  used. 

Media  Access  Control.  This  layer  provides  fair  and  deterministic 
access  to  the  medium. 

Million  bits  per  second.  2^  bits  of  information  (usually  used  to 
express  a  data  transfer  rate;  as  in,  1  megabit/second  •  1  Mbps). 

Maximum  transfer  unit  The  biggest  piece  of  data  that  can  be 
transferred  by  the  data  link  layer. 

Negative  acknowledgment  Response  to  nonreceipt  or  receipt  of  a 
corrupt  packet  of  information. 

Network  Hie  System.  A  distributed  file  system  developed  by  Sun 
Microsystems  and  widely  used  on  TCP/IP  systems. 

Network  Information  Service.  Name  service  in  the  Sun  Open 
Network  Confuting  (ONC)  family. 

Network  Peripheral  Inc.  The  manufacture  of  the  FDDI  interface 
cards  used  in  diis  investigation  on  the  Sun  SPARC  workstations. 

Nonretum-to-Zero  Inverted.  NRZI  is  an  example  of  differential 
encoding.  In  diffoential  encoding,  the  signal  is  decoded  by 
comparing  die  polarity  of  adjacent  signal  elements  rather  than 
determining  the  absolute  value  of  a  signal  elemenL 

Open  System  Interconnection. 

Physical  Connection  Management 
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PHY 


Physical  Layer.  PHY  provides  the  media  independent  functions 
associated  with  the  OS  I  physical  layer. 


PMD  Physical  Medium  Dependent  Layer.  PMD  specities  the 

transmitters,  receivers  and  other  associated  hardware 

PROM  Programmable  Read-Only  Memory. 

RARP  Reverse  Address  Resolution  Protocol. 

RISC  Reduced  Instructiofi  Set  Computer.  Generic  name  for  CPUs  that  use 

a  simpler  instruction  set  than  more  tradit  ^hal  designs.  The  Sun 
SPARC  workstation  uses  RISC  technology. 

SMT  Station  Management  document  This  layer  provides  the  capability 

to  monitor  the  FDDI  network.  SMT  can  provide  services  such  as 
node  initialization,  bypassing  faulty  nodes  and  recovery. 

SPARC  Scalable  Processor  Architecture.  A  reduced  instruction  set  (RISQ 

processor  developed  by  Sun  and  licensed  by  several  vendors 
including  AT&T  and  Texas  Instruments. 

SUN  Stanford  University  Network.  This  name  was  given  for  a  printed 

circuit  board  developed  in  1981  that  was  designed  to  run  the  UNXI 
operating  system. 

TCP/IP  Transmission  Control  Protocol/Intemet  Protocol.  This  is  a  common 

shorthand  which  refers  to  the  suite  of  application  and  transport 
protocols  which  run  over  IP.  These  include  FTP,  Telnet,  SMTP,  and 
UDP. 

THT  Token  holding  timer.  Token  ring  and  FDDI  term  for  the  amount  of 

time  a  node  can  transmit  data  before  sending  the  token  back  out  to 
the  ring. 

TTRT  Target  token  rotation  time.  A  term  used  in  FDDI  to  set  performance 

parameters.  The  TTRT  serves  as  a  measure  of  expected  delay  and  is 
used,  among  other  things,  to  set  time-out  parameters. 

UDP  User  Datagram  Protocol. 
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